Optimizing and evaluating the reconstruction of Metagenome-assembled microbial genomes

BMC Genomics. 2017;18(1):1-13 DOI 10.1186/s12864-017-4294-1


Journal Homepage

Journal Title: BMC Genomics

ISSN: 1471-2164 (Online)

Publisher: BMC

LCC Subject Category: Technology: Chemical technology: Biotechnology | Science: Biology (General): Genetics

Country of publisher: United Kingdom

Language of fulltext: English

Full-text formats available: PDF, HTML



Bhavya Papudeshi (Bioinformatics and Medical Informatics, San Diego State University)
J. Matthew Haggerty (Department of Biology, San Diego State University)
Michael Doane (Department of Biology, San Diego State University)
Megan M. Morris (Department of Biology, San Diego State University)
Kevin Walsh (Department of Biology, San Diego State University)
Douglas T. Beattie (Department of Biology, University of New South Wales)
Dnyanada Pande (Bioinformatics and Medical Informatics, San Diego State University)
Parisa Zaeri (Department of Mathematics and Statistics, San Diego State University)
Genivaldo G. Z. Silva (Computational Science Research Center, San Diego State University)
Fabiano Thompson (Institute of Biology, Federal University of Rio de Janeiro (UFRJ))
Robert A. Edwards (Department of Computer Science, San Diego State University)
Elizabeth A. Dinsdale (Department of Biology, San Diego State University)


Blind peer review

Editorial Board

Instructions for authors

Time From Submission to Publication: 17 weeks


Abstract | Full Text

Abstract Background Microbiome/host interactions describe characteristics that affect the host's health. Shotgun metagenomics includes sequencing a random subset of the microbiome to analyze its taxonomic and metabolic potential. Reconstruction of DNA fragments into genomes from metagenomes (called metagenome-assembled genomes) assigns unknown fragments to taxa/function and facilitates discovery of novel organisms. Genome reconstruction incorporates sequence assembly and sorting of assembled sequences into bins, characteristic of a genome. However, the microbial community composition, including taxonomic and phylogenetic diversity may influence genome reconstruction. We determine the optimal reconstruction method for four microbiome projects that had variable sequencing platforms (IonTorrent and Illumina), diversity (high or low), and environment (coral reefs and kelp forests), using a set of parameters to select for optimal assembly and binning tools. Methods We tested the effects of the assembly and binning processes on population genome reconstruction using 105 marine metagenomes from 4 projects. Reconstructed genomes were obtained from each project using 3 assemblers (IDBA, MetaVelvet, and SPAdes) and 2 binning tools (GroopM and MetaBat). We assessed the efficiency of assemblers using statistics that including contig continuity and contig chimerism and the effectiveness of binning tools using genome completeness and taxonomic identification. Results We concluded that SPAdes, assembled more contigs (143,718 ± 124 contigs) of longer length (N50 = 1632 ± 108 bp), and incorporated the most sequences (sequences-assembled = 19.65%). The microbial richness and evenness were maintained across the assembly, suggesting low contig chimeras. SPAdes assembly was responsive to the biological and technological variations within the project, compared with other assemblers. Among binning tools, we conclude that MetaBat produced bins with less variation in GC content (average standard deviation: 1.49), low species richness (4.91 ± 0.66), and higher genome completeness (40.92 ± 1.75) across all projects. MetaBat extracted 115 bins from the 4 projects of which 66 bins were identified as reconstructed metagenome-assembled genomes with sequences belonging to a specific genus. We identified 13 novel genomes, some of which were 100% complete, but show low similarity to genomes within databases. Conclusions In conclusion, we present a set of biologically relevant parameters for evaluation to select for optimal assembly and binning tools. For the tools we tested, SPAdes assembler and MetaBat binning tools reconstructed quality metagenome-assembled genomes for the four projects. We also conclude that metagenomes from microbial communities that have high coverage of phylogenetically distinct, and low taxonomic diversity results in highest quality metagenome-assembled genomes.