Machine-learning analysis of cross-study samples according to the gut microbiome in 12 infant cohorts

Petri Vänni; Mysore V. Tejesvi; Niko Paalanne; Kjersti Aagaard; Gail Ackermann; Carlos A. Camargo; Merete Eggesbø; Kohei Hasegawa; Anne G. Hoen; Margaret R. Karagas; Kaija-Leena Kolho; Martin F. Laursen; Johnny Ludvigsson; Juliette Madan; Dennis Ownby; Catherine Stanton; Jakob Stokholm; Terhi Tapiainen

doi:10.1128/msystems.00364-23

mSystems (Dec 2023)

Machine-learning analysis of cross-study samples according to the gut microbiome in 12 infant cohorts

Petri Vänni,
Mysore V. Tejesvi,
Niko Paalanne,
Kjersti Aagaard,
Gail Ackermann,
Carlos A. Camargo,
Merete Eggesbø,
Kohei Hasegawa,
Anne G. Hoen,
Margaret R. Karagas,
Kaija-Leena Kolho,
Martin F. Laursen,
Johnny Ludvigsson,
Juliette Madan,
Dennis Ownby,
Catherine Stanton,
Jakob Stokholm,
Terhi Tapiainen

Affiliations

Petri Vänni: Research Unit of Clinical Medicine, University of Oulu, Oulu, Finland
Mysore V. Tejesvi: Research Unit of Clinical Medicine, University of Oulu, Oulu, Finland
Niko Paalanne: Research Unit of Clinical Medicine, University of Oulu, Oulu, Finland
Kjersti Aagaard: Department of Obstetrics & Gynecology, Division of Maternal-Fetal Medicine, Baylor College of Medicine and Texas Children’s Hospital, Houston, Texas, USA
Gail Ackermann: Department of Pediatrics, University of California, San Diego, California, USA
Carlos A. Camargo: Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
Merete Eggesbø: Department of Climate and Environmental Health, Norwegian Institute of Public Health, Oslo, Norway
Kohei Hasegawa: Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
Anne G. Hoen: Department of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, New Hampshire, USA
Margaret R. Karagas: Department of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, New Hampshire, USA
Kaija-Leena Kolho: Children’s Hospital, University of Helsinki and HUS, Helsinki, Finland
Martin F. Laursen: National Food Institute, Technical University of Denmark, Lyngby, Denmark
Johnny Ludvigsson: Crown Princess Victoria Children’s Hospital and Division of Pediatrics, Department of Biomedical and Clinical Sciences, Linköping University, Linköping, Sweden
Juliette Madan: Department of Psychiatry, Dartmouth Hitchcock Medical Center, Geisel School of Medicine at Dartmouth, Lebanon, New Hampshire, USA
Dennis Ownby: Medical College of Georgia, Augusta, Georgia, USA
Catherine Stanton: Teagasc Food Research Centre & APC Microbiome Ireland, Moorepark, Fermoy, Co. Cork, Ireland
Jakob Stokholm: Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
Terhi Tapiainen: Research Unit of Clinical Medicine, University of Oulu, Oulu, Finland

DOI: https://doi.org/10.1128/msystems.00364-23
Journal volume & issue: Vol. 8, no. 6

Abstract

Read online

ABSTRACT Combining and comparing microbiome data from distinct infant cohorts has been challenging because such data are inherently multidimensional and complex. Here, we used an ensemble of machine-learning (ML) models and studied 16S rRNA amplicon sequencing data from 4,099 gut microbiome samples representing 12 prospectively collected infant cohorts. We chose the childbirth delivery mode as a starting point for such analysis because it has previously been associated with alterations in the gut microbiome in infants. In cross-study ensemble models, Bacteroides was the most important feature in all machine-learning models. The predictive capacity by taxonomy varied with age. At the age of 1–2 months, gut microbiome data were able to predict delivery mode with an area under the curve of 0.72 to 0.83. In contrast, ML models trained on taxa were not able to differentiate between the modes of delivery, in any of the cohorts, when the infants were between 3 and 12 months of age. Moreover, no ML model, alternately trained on the functional pathways of the infant gut microbiome, could consistently predict mode of delivery at any infant age. This study shows that infant gut microbiome data sets can be effectively combined with the application of ML analysis across different study populations.IMPORTANCEThere are challenges in merging microbiome data from diverse research groups due to the intricate and multifaceted nature of such data. To address this, we utilized a combination of machine-learning (ML) models to analyze 16S sequencing data from a substantial set of gut microbiome samples, sourced from 12 distinct infant cohorts that were gathered prospectively. Our initial focus was on the mode of delivery due to its prior association with changes in infant gut microbiomes. Through ML analysis, we demonstrated the effective merging and comparison of various gut microbiome data sets, facilitating the identification of robust microbiome biomarkers applicable across varied study populations.

Published in mSystems

ISSN: 2379-5077 (Online)
Publisher: American Society for Microbiology
Country of publisher: United States
LCC subjects: Science: Microbiology
Website: https://journals.asm.org/journal/msystems

About the journal

Abstract

Keywords