Water (Nov 2022)

Challenges of Comparing Marine Microbiome Community Composition Data Provided by Different Commercial Laboratories and Classification Databases

  • Monika Mioduchowska,
  • Anna Iglikowska,
  • Jan P. Jastrzębski,
  • Anna-Karina Kaczorowska,
  • Ewa Kotlarska,
  • Artur Trzebny,
  • Agata Weydmann-Zwolicka

DOI
https://doi.org/10.3390/w14233855
Journal volume & issue
Vol. 14, no. 23
p. 3855

Abstract

Read online

In the high-throughput sequencing (HTS) era, a metabarcoding technique based on the bacterial V3–V4 hypervariable region of 16S rRNA analysis requires sophisticated bioinformatics pipelines and validated methods that allow researchers to compare their data with confidence. Many commercial laboratories conduct extensive HTS analyses; however, there is no available information on whether the results generated by these vendors are consistent. In our study, we compared the sequencing data obtained for the same marine microbiome community sample generated by three commercial laboratories. Additionally, as a sequencing control to determine differences between commercial laboratories and two 16S rRNA databases, we also performed a “mock community” analysis of a defined number of microbial species. We also assessed the impact of the choice of two commonly used 16S rRNA databases, i.e., Greengenes and SILVA, on downstream data analysis, including taxonomic classification assignment. We demonstrated that the final results depend on the choice of the laboratory conducting the HTS and the reference database of ribosomal sequences. Our findings showed that the number of produced ASVs (amplicon sequence variants) ranged from 137 to 564. Different putative bacterial endosymbionts could be identified, and these differences correspond to the applied 16S rRNA database. The results presented might be of particular interest to researchers who plan to perform microbiome community analysis using the 16S rRNA marker gene, including the identification of putative bacterial endosymbionts, and serve as a guide for choosing the optimum pipeline to obtain the most accurate and reproducible data.

Keywords