PLoS ONE (Jan 2023)

The selection of software and database for metagenomics sequence analysis impacts the outcome of microbial profiling and pathogen detection.

  • Ruijie Xu,
  • Sreekumari Rajeev,
  • Liliana C M Salvador

DOI
https://doi.org/10.1371/journal.pone.0284031
Journal volume & issue
Vol. 18, no. 4
p. e0284031

Abstract

Read online

Shotgun metagenomic sequencing analysis is widely used for microbial profiling of biological specimens and pathogen detection. However, very little is known about the technical biases caused by the choice of analysis software and databases on the biological specimen. In this study, we evaluated different direct read shotgun metagenomics taxonomic profiling software to characterize the microbial compositions of simulated mice gut microbiome samples and of biological samples collected from wild rodents across multiple taxonomic levels. Using ten of the most widely used metagenomics software and four different databases, we demonstrated that obtaining an accurate species-level microbial profile using the current direct read metagenomics profiling software is still a challenging task. We also showed that the discrepancies in results when different databases and software were used could lead to significant variations in the distinct microbial taxa classified, in the characterizations of the microbial communities, and in the differentially abundant taxa identified. Differences in database contents and read profiling algorithms are the main contributors for these discrepancies. The inclusion of host genomes and of genomes of the interested taxa in the databases is important for increasing the accuracy of profiling. Our analysis also showed that software included in this study differed in their ability to detect the presence of Leptospira, a major zoonotic pathogen of one health importance, especially at the species level resolution. We concluded that using different databases and software combinations can result in confounding biological conclusions in microbial profiling. Our study warrants that software and database selection must be based on the purpose of the study.