Biomedicines (Oct 2024)

Correction of Batch Effect in Gut Microbiota Profiling of ASD Cohorts from Different Geographical Origins

  • Matteo Scanu,
  • Federica Del Chierico,
  • Riccardo Marsiglia,
  • Francesca Toto,
  • Silvia Guerrera,
  • Giovanni Valeri,
  • Stefano Vicari,
  • Lorenza Putignani

DOI
https://doi.org/10.3390/biomedicines12102350
Journal volume & issue
Vol. 12, no. 10
p. 2350

Abstract

Read online

Background: To date, there have been numerous metataxonomic studies on gut microbiota (GM) profiling based on the analyses of data from public repositories. However, differences in study population and wet and dry pipelines have produced discordant results. Herein, we propose a biostatistical approach to remove these batch effects for the GM characterization in the case of autism spectrum disorders (ASDs). Methods: An original dataset of GM profiles from patients with ASD was ecologically characterized and compared with GM public digital profiles of age-matched neurotypical controls (NCs). Also, GM data from seven case–control studies on ASD were retrieved from the NCBI platform and exploited for analysis. Hence, on each dataset, conditional quantile regression (CQR) was performed to reduce the batch effects originating from both technical and geographical confounders affecting the GM-related data. This method was further applied to the whole dataset matrix, obtained by merging all datasets. The ASD GM markers were identified by the random forest (RF) model. Results: We observed a different GM profile in patients with ASD compared with NC subjects. Moreover, a significant reduction of technical- and geographical-dependent batch effects in all datasets was achieved. We identified Bacteroides_H, Faecalibacterium, Gemmiger_A_73129, Blautia_A_141781, Bifidobacterium_388775, and Phocaeicola_A_858004 as robust GM bacterial biomarkers of ASD. Finally, our validation approach provided evidence of the validity of the QCR method, showing high values of accuracy, specificity, sensitivity, and AUC-ROC. Conclusions: Herein, we proposed an updated biostatistical approach to reduce the technical and geographical batch effects that may negatively affect the description of bacterial composition in microbiota studies.

Keywords