Cell Reports: Methods (Nov 2023)

MaLiAmPi enables generalizable and taxonomy-independent microbiome features from technically diverse 16S-based microbiome studies

  • Samuel S. Minot,
  • Bailey Garb,
  • Alennie Roldan,
  • Alice S. Tang,
  • Tomiko T. Oskotsky,
  • Christopher Rosenthal,
  • Noah G. Hoffman,
  • Marina Sirota,
  • Jonathan L. Golob

Journal volume & issue
Vol. 3, no. 11
p. 100639

Abstract

Read online

Summary: For studies using microbiome data, the ability to robustly combine data from technically and biologically distinct microbiome studies is a crucial means of supporting more robust and clinically relevant inferences. Formidable technical challenges arise when attempting to combine data from technically diverse 16S rRNA gene variable region amplicon sequencing (16S) studies. Closed operational taxonomic units and taxonomy are criticized as being heavily dependent upon reference sets and with limited precision relative to the underlying biology. Phylogenetic placement has been demonstrated to be a promising taxonomy-free manner of harmonizing microbiome data, but it has lacked a validated count-based feature suitable for use in machine learning and association studies. Here we introduce a phylogenetic-placement-based, taxonomy-independent, compositional feature of microbiota: phylotypes. Phylotypes were predictive of clinical outcomes such as obesity or pre-term birth on technically diverse independent validation sets harmonized post hoc. Thus, phylotypes enable the rigorous cross-validation of 16S-based clinical prognostic models and associative microbiome studies. Motivation: A gold standard for statistical power and generalizability of microbiome research is to analyze large datasets representing heterogeneous populations, which can be accomplished by combining data from multiple studies. For 16S rRNA gene variable region amplicon-based microbiome studies, hundreds of thousands of already sequenced specimens are available from public repositories offering an opportunity to achieve this gold standard, but the use of these data is hampered by formidable technical challenges when combining data from technically diverse studies. To overcome these challenges, we developed phylotypes: a taxonomy-independent, stable compositional feature that is generalizable across technically diverse microbiome studies.

Keywords