Metagenomics refers to the survey of all microorganisms present in a sample, usually animal or human stools, saliva, blood, tissue, but also environmental samples like sewage and pond water.

At TAGC we have conducted a wide variety of metagenomics projects including human gut, mouse gut and diverse human clinical samples. Typical analysis of metagenomics data includes quality control of sequences, and taxonomical classification conducted against curated reference databases. Once the relative abundance of different taxa is computed, they are compared among groups, using statistical tests like exact Fisher test, Mann-Whitney test, Kruskal-Wallis, linear discriminant analysis (LDA), permutation multivariate analysis of variance (PERMANOVA), as well as regression and correlation analyses.


In RNAseq, all transcripts are cloned, amplified and sequenced, to produce a snapshot of gene expression. Essentially, analysis of RNAseq data entails estimation of transcripts abundance, using a variety of algorithms that use maximum likelihood and/or expectation maximization. Once count estimations are obtained, transcriptome profiles are inspected through unsupervised clustering analysis and ordination techniques such as principal component analysis (PCA) or multidimensional scaling (MDS).

Differential expression analysis is conducted with software that models data distribution and normalizes the data accordingly. Differences in mean values of features in each group are usually assessed with a Fisher’s exact test, from which a p value is derived and then corrected to determine the false discovery rate (FDR). Differentially expressed genes can then collectively be further analyzed to organize them into pathways (when pathways analysis is conducted) or into terms (when gene ontology analysis is conducted).

Genome Sequencing

In genome sequencing, a large fraction of a genome (very rarely the whole genome) is sequenced. The larger the genome under study, the more difficult it is to be sequenced in its entirety. Usually, the genome is fragmented enzymatically (e.g. tagmentation, restriction enzymes), chemically (e.g. divalent metal cations), or physically (e.g. sonication, hydrodynamic shearing). Once the genome (or parts thereof) is sequenced, a reference genome can be used to conduct a procedure called reference-based assembly, or more complicated methods can be used to perform de novo assembly. TAGC has experience in the sequencing of genomes from bacteria, fungi and small eukaryotes.

Recently, a new sequencing technology has been developed by the company Oxford Nanopore Technologies, whereby long strands of DNA are sequenced by bacterial enzymes (pore proteins). Such long sequences can be combined with NGS short sequences (which are normally of higher quality) into a process called hybrid assembly. TAGC has used this Nanopore technology to sequence baterial and small eukaryotic genomes.

Virus Discovery

Due to its nonspecific nature, NGS technologies allow the discovery of new viruses through the identification of DNA molecules from clinical and environmental samples. Sequences can be assembled and queried against a generalist protein database. Viral sequences will show considerable homology to viral proteins. Once viral sequences are identified, the whole virus genome can be recovered from the original samples utilizing conventional molecular biology techniques.

Custom Bioinformatics

TAGC has the ability to develop new bioinformatics methods, including writing custom software. Part of our mandate is to work with researchers to advance their research programs in this area.