BMC Microbiology (Sep 2021)

MetaGeneBank: a standardized database to study deep sequenced metagenomic data from human fecal specimen

  • Li Shao,
  • Jie Liao,
  • Jingyang Qian,
  • Wenbin Chen,
  • Xiaohui Fan

DOI
https://doi.org/10.1186/s12866-021-02321-z
Journal volume & issue
Vol. 21, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Background Microbiome big data from population-scale cohorts holds the key to unleash the power of microbiomes to overcome critical challenges in disease control, treatment and precision medicine. However, variations introduced during data generation and processing limit the comparisons among independent studies in respect of interpretability. Although multiple databases have been constructed as platforms for data reuse, they are of limited value since only raw sequencing files are considered. Description Here, we present MetaGeneBank, a standardized database that provides details on sample collection and sequencing, and abundances of genes, microbiota and molecular functions for 4470 raw sequencing files (over 12 TB) collected from 16 studies covering over 10 types of diseases and 14 countries using a unified data-processing pipeline. The incorporation of tools that enable browsing and searching with descriptive attributes, gene sequences, microbiota and functions makes the database user-friendly. We found that the source of specimen contributes more than sequencing centers or platforms to the variations of microbiota. Special attention should be paid when re-analyzing sequencing files from different countries. Conclusions Collectively, MetaGeneBank provides a gateway to utilize the untapped potential of gut metagenomic data in helping fighting against human diseases. With the continuous updating of the database in terms of data volume, data types and sample types, MetaGeneBank would undoubtedly be the benchmarking database in the future in respect of data reuse, and would be valuable in translational science.

Keywords