Environmental Microbiome (Aug 2022)

Sequencing introduced false positive rare taxa lead to biased microbial community diversity, assembly, and interaction interpretation in amplicon studies

  • Yangyang Jia,
  • Shengguo Zhao,
  • Wenjie Guo,
  • Ling Peng,
  • Fang Zhao,
  • Lushan Wang,
  • Guangyi Fan,
  • Yuanfang Zhu,
  • Dayou Xu,
  • Guilin Liu,
  • Ruoqing Wang,
  • Xiaodong Fang,
  • He Zhang,
  • Karsten Kristiansen,
  • Wenwei Zhang,
  • Jianwei Chen

DOI
https://doi.org/10.1186/s40793-022-00436-y
Journal volume & issue
Vol. 17, no. 1
pp. 1 – 18

Abstract

Read online

Abstract Background Increasing studies have demonstrated potential disproportionate functional and ecological contributions of rare taxa in a microbial community. However, the study of the microbial rare biosphere is hampered by their inherent scarcity and the deficiency of currently available techniques. Sample-wise cross contaminations might be introduced by sample index misassignment in the most widely used metabarcoding amplicon sequencing approach. Although downstream bioinformatic quality control and clustering or denoising algorithms could remove sequencing errors and non-biological artifact reads, no algorithm could eliminate high quality reads from sample-wise cross contaminations introduced by index misassignment, making it difficult to distinguish between bona fide rare taxa and potential false positives in metabarcoding studies. Results We thoroughly evaluated the rate of index misassignment of the widely used NovaSeq 6000 and DNBSEQ-G400 sequencing platforms using both commercial and customized mock communities, and observed significant lower (0.08% vs. 5.68%) fraction of potential false positive reads for DNBSEQ-G400 as compared to NovaSeq 6000. Significant batch effects could be caused by stochastically introduced false positive or false negative rare taxa. These false detections could also lead to inflated alpha diversity of relatively simple microbial communities and underestimated that of complex ones. Further test using a set of cow rumen samples reported differential rare taxa by different sequencing platforms. Correlation analysis of the rare taxa detected by each sequencing platform demonstrated that the rare taxa identified by DNBSEQ-G400 platform had a much higher possibility to be correlated with the physiochemical properties of rumen fluid as compared to NovaSeq 6000 platform. Community assembly mechanism and microbial network correlation analysis indicated that false positive or negative rare taxa detection could lead to biased community assembly mechanism and identification of fake keystone species of the community. Conclusions We highly suggest proper positive/negative/blank controls, technical replicate settings, and proper sequencing platform selection in future amplicon studies, especially when the microbial rare biosphere would be focused.

Keywords