Machine Learning Data Analysis Highlights the Role of <i>Parasutterella</i> and <i>Alloprevotella</i> in Autism Spectrum Disorders
Daniele Pietrucci,
Adelaide Teofani,
Marco Milanesi,
Bruno Fosso,
Lorenza Putignani,
Francesco Messina,
Graziano Pesole,
Alessandro Desideri,
Giovanni Chillemi
Affiliations
Daniele Pietrucci
Department for Innovation in Biological, Agro-Food and Forest Systems (DIBAF), University of Tuscia, 01100 Viterbo, Italy
Adelaide Teofani
Department of Biology, University of Rome Tor Vergata, Via Montpellier 1, 00133 Rome, Italy
Marco Milanesi
Department for Innovation in Biological, Agro-Food and Forest Systems (DIBAF), University of Tuscia, 01100 Viterbo, Italy
Bruno Fosso
Department of Biosciences, Biotechnology and Biopharmaceutics, University of Bari “A. Moro”, Piazza Umberto I, 1, 70121 Bari, Italy
Lorenza Putignani
Unit of Microbiology and Diagnostic Immunology, Units of Microbiomics, Department of Diagnostic and Laboratory Medicine, Bambino Gesù Children’s Hospital, IRCCS, 00146 Rome, Italy
Francesco Messina
Laboratory of Microbiology and Biological Bank National Institute for Infectious Diseases “Lazzaro Spallanzani” Istituto di Ricovero e Cura a Carattere Scientifico, 00149 Rome, Italy
Graziano Pesole
Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, IBIOM, CNR, 70126 Bari, Italy
Alessandro Desideri
Department of Biology, University of Rome Tor Vergata, Via Montpellier 1, 00133 Rome, Italy
Giovanni Chillemi
Department for Innovation in Biological, Agro-Food and Forest Systems (DIBAF), University of Tuscia, 01100 Viterbo, Italy
In recent years, the involvement of the gut microbiota in disease and health has been investigated by sequencing the 16S gene from fecal samples. Dysbiotic gut microbiota was also observed in Autism Spectrum Disorder (ASD), a neurodevelopmental disorder characterized by gastrointestinal symptoms. However, despite the relevant number of studies, it is still difficult to identify a typical dysbiotic profile in ASD patients. The discrepancies among these studies are due to technical factors (i.e., experimental procedures) and external parameters (i.e., dietary habits). In this paper, we collected 959 samples from eight available projects (540 ASD and 419 Healthy Controls, HC) and reduced the observed bias among studies. Then, we applied a Machine Learning (ML) approach to create a predictor able to discriminate between ASD and HC. We tested and optimized three algorithms: Random Forest, Support Vector Machine and Gradient Boosting Machine. All three algorithms confirmed the importance of five different genera, including Parasutterella and Alloprevotella. Furthermore, our results show that ML algorithms could identify common taxonomic features by comparing datasets obtained from countries characterized by latent confounding variables.