Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment

Laura Judith Marcos-Zambrano; Kanita Karaduzovic-Hadziabdic; Tatjana Loncar Turukalo; Piotr Przymus; Vladimir Trajkovik; Oliver Aasmets; Oliver Aasmets; Magali Berland; Aleksandra Gruca; Jasminka Hasic; Karel Hron; Thomas Klammsteiner; Mikhail Kolev; Leo Lahti; Marta B. Lopes; Marta B. Lopes; Victor Moreno; Victor Moreno; Victor Moreno; Victor Moreno; Irina Naskinova; Elin Org; Inês Paciência; Georgios Papoutsoglou; Rajesh Shigdel; Blaz Stres; Baiba Vilne; Malik Yousef; Malik Yousef; Eftim Zdravevski; Ioannis Tsamardinos; Enrique Carrillo de Santa Pau; Marcus J. Claesson; Isabel Moreno-Indias; Isabel Moreno-Indias; Jaak Truu

doi:10.3389/fmicb.2021.634511

Frontiers in Microbiology (Feb 2021)

Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment

Laura Judith Marcos-Zambrano,
Kanita Karaduzovic-Hadziabdic,
Tatjana Loncar Turukalo,
Piotr Przymus,
Vladimir Trajkovik,
Oliver Aasmets,
Oliver Aasmets,
Magali Berland,
Aleksandra Gruca,
Jasminka Hasic,
Karel Hron,
Thomas Klammsteiner,
Mikhail Kolev,
Leo Lahti,
Marta B. Lopes,
Marta B. Lopes,
Victor Moreno,
Victor Moreno,
Victor Moreno,
Victor Moreno,
Irina Naskinova,
Elin Org,
Inês Paciência,
Georgios Papoutsoglou,
Rajesh Shigdel,
Blaz Stres,
Baiba Vilne,
Malik Yousef,
Malik Yousef,
Eftim Zdravevski,
Ioannis Tsamardinos,
Enrique Carrillo de Santa Pau,
Marcus J. Claesson,
Isabel Moreno-Indias,
Isabel Moreno-Indias,
Jaak Truu

Affiliations

Laura Judith Marcos-Zambrano: Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain
Kanita Karaduzovic-Hadziabdic: Faculty of Engineering and Natural Sciences, International University of Sarajevo, Sarajevo, Bosnia and Herzegovina
Tatjana Loncar Turukalo: Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia
Piotr Przymus: Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Toruń, Poland
Vladimir Trajkovik: Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia
Oliver Aasmets: Institute of Genomics, Estonian Genome Centre, University of Tartu, Tartu, Estonia
Oliver Aasmets: Department of Biotechnology, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
Magali Berland: Université Paris-Saclay, INRAE, MGP, Jouy-en-Josas, France
Aleksandra Gruca: Department of Computer Networks and Systems, Silesian University of Technology, Gliwice, Poland
Jasminka Hasic: 0University Sarajevo School of Science and Technology, Sarajevo, Bosnia and Herzegovina
Karel Hron: 1Department of Mathematical Analysis and Applications of Mathematics, Palacký University, Olomouc, Czechia
Thomas Klammsteiner: 2Department of Microbiology, University of Innsbruck, Innsbruck, Austria
Mikhail Kolev: 3South West University “Neofit Rilski”, Blagoevgrad, Bulgaria
Leo Lahti: 4Department of Computing, University of Turku, Turku, Finland
Marta B. Lopes: 5NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), FCT, UNL, Caparica, Portugal
Marta B. Lopes: 6Centro de Matemática e Aplicações (CMA), FCT, UNL, Caparica, Portugal
Victor Moreno: 7Oncology Data Analytics Program, Catalan Institute of Oncology (ICO)Barcelona, Spain
Victor Moreno: 8Colorectal Cancer Group, Institut de Recerca Biomedica de Bellvitge (IDIBELL), Barcelona, Spain
Victor Moreno: 9Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), Barcelona, Spain
Victor Moreno: 0Department of Clinical Sciences, Faculty of Medicine, University of Barcelona, Barcelona, Spain
Irina Naskinova: 3South West University “Neofit Rilski”, Blagoevgrad, Bulgaria
Elin Org: Institute of Genomics, Estonian Genome Centre, University of Tartu, Tartu, Estonia
Inês Paciência: 1EPIUnit – Instituto de Saúde Pública da Universidade do Porto, Porto, Portugal
Georgios Papoutsoglou: 2Department of Computer Science, University of Crete, Heraklion, Greece
Rajesh Shigdel: 3Department of Clinical Science, University of Bergen, Bergen, Norway
Blaz Stres: 4Group for Microbiology and Microbial Biotechnology, Department of Animal Science, University of Ljubljana, Ljubljana, Slovenia
Baiba Vilne: 5Bioinformatics Research Unit, Riga Stradins University, Riga, Latvia
Malik Yousef: 6Department of Information Systems, Zefat Academic College, Zefat, Israel
Malik Yousef: 7Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel
Eftim Zdravevski: Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia
Ioannis Tsamardinos: 2Department of Computer Science, University of Crete, Heraklion, Greece
Enrique Carrillo de Santa Pau: Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain
Marcus J. Claesson: 8School of Microbiology & APC Microbiome Ireland, University College Cork, Cork, Ireland
Isabel Moreno-Indias: 9Unidad de Gestión Clínica de Endocrinología y Nutrición, Instituto de Investigación Biomédica de Málaga (IBIMA), Hospital Clínico Universitario Virgen de la Victoria, Universidad de Málaga, Málaga, Spain
Isabel Moreno-Indias: 0Centro de Investigación Biomédica en Red de Fisiopatología de la Obesidad y la Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain
Jaak Truu: 1Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia

DOI: https://doi.org/10.3389/fmicb.2021.634511
Journal volume & issue: Vol. 12

Abstract

Read online

The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.

Published in Frontiers in Microbiology

ISSN: 1664-302X (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Microbiology
Website: http://www.frontiersin.org/journals/microbiology

About the journal

Abstract

Keywords