mSystems (Apr 2023)

Vaginal Microbiome Metagenome Inference Accuracy: Differential Measurement Error according to Community Composition

  • Kayla A. Carter,
  • Anthony A. Fodor,
  • Jennifer E. Balkus,
  • Angela Zhang,
  • Myrna G. Serrano,
  • Gregory A. Buck,
  • Stephanie M. Engel,
  • Michael C. Wu,
  • Shan Sun

DOI
https://doi.org/10.1128/msystems.01003-22
Journal volume & issue
Vol. 8, no. 2

Abstract

Read online

ABSTRACT Several studies have compared metagenome inference performance in different human body sites; however, none specifically reported on the vaginal microbiome. Findings from other body sites cannot easily be generalized to the vaginal microbiome due to unique features of vaginal microbial ecology, and investigators seeking to use metagenome inference in vaginal microbiome research are “flying blind” with respect to potential bias these methods may introduce into analyses. We compared the performance of PICRUSt2 and Tax4Fun2 using paired 16S rRNA gene amplicon sequencing and whole-metagenome sequencing data from vaginal samples from 72 pregnant individuals enrolled in the Pregnancy, Infection, and Nutrition (PIN) cohort. Participants were selected from those with known birth outcomes and adequate 16S rRNA gene amplicon sequencing data in a case-control design. Cases experienced early preterm birth (<32 weeks of gestation), and controls experienced term birth (37 to 41 weeks of gestation). PICRUSt2 and Tax4Fun2 performed modestly overall (median Spearman correlation coefficients between observed and predicted KEGG ortholog [KO] relative abundances of 0.20 and 0.22, respectively). Both methods performed best among Lactobacillus crispatus-dominated vaginal microbiotas (median Spearman correlation coefficients of 0.24 and 0.25, respectively) and worst among Lactobacillus iners-dominated microbiotas (median Spearman correlation coefficients of 0.06 and 0.11, respectively). The same pattern was observed when evaluating correlations between univariable hypothesis test P values generated with observed and predicted metagenome data. Differential metagenome inference performance across vaginal microbiota community types can be considered differential measurement error, which often causes differential misclassification. As such, metagenome inference will introduce hard-to-predict bias (toward or away from the null) in vaginal microbiome research. IMPORTANCE Compared to taxonomic composition, the functional potential within a bacterial community is more relevant to establishing mechanistic understandings and causal relationships between the microbiome and health outcomes. Metagenome inference attempts to bridge the gap between 16S rRNA gene amplicon sequencing and whole-metagenome sequencing by predicting a microbiome’s gene content based on its taxonomic composition and annotated genome sequences of its members. Metagenome inference methods have been evaluated primarily among gut samples, where they appear to perform fairly well. Here, we show that metagenome inference performance is markedly worse for the vaginal microbiome and that performance varies across common vaginal microbiome community types. Because these community types are associated with sexual and reproductive outcomes, differential metagenome inference performance will bias vaginal microbiome studies, obscuring relationships of interest. Results from such studies should be interpreted with substantial caution and the understanding that they may over- or underestimate associations with metagenome content.

Keywords