Communications Medicine (May 2024)

Machine learning-based health environmental-clinical risk scores in European children

  • Jean-Baptiste Guimbaud,
  • Alexandros P. Siskos,
  • Amrit Kaur Sakhi,
  • Barbara Heude,
  • Eduard Sabidó,
  • Eva Borràs,
  • Hector Keun,
  • John Wright,
  • Jordi Julvez,
  • Jose Urquiza,
  • Kristine Bjerve Gützkow,
  • Leda Chatzi,
  • Maribel Casas,
  • Mariona Bustamante,
  • Mark Nieuwenhuijsen,
  • Martine Vrijheid,
  • Mónica López-Vicente,
  • Montserrat de Castro Pascual,
  • Nikos Stratakis,
  • Oliver Robinson,
  • Regina Grazuleviciene,
  • Remy Slama,
  • Silvia Alemany,
  • Xavier Basagaña,
  • Marc Plantevit,
  • Rémy Cazabet,
  • Léa Maitre

DOI
https://doi.org/10.1038/s43856-024-00513-y
Journal volume & issue
Vol. 4, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Background Early life environmental stressors play an important role in the development of multiple chronic disorders. Previous studies that used environmental risk scores (ERS) to assess the cumulative impact of environmental exposures on health are limited by the diversity of exposures included, especially for early life determinants. We used machine learning methods to build early life exposome risk scores for three health outcomes using environmental, molecular, and clinical data. Methods In this study, we analyzed data from 1622 mother-child pairs from the HELIX European birth cohorts, using over 300 environmental, 100 child peripheral, and 18 mother-child clinical markers to compute environmental-clinical risk scores (ECRS) for child behavioral difficulties, metabolic syndrome, and lung function. ECRS were computed using LASSO, Random Forest and XGBoost. XGBoost ECRS were selected to extract local feature contributions using Shapley values and derive feature importance and interactions. Results ECRS captured 13%, 50% and 4% of the variance in mental, cardiometabolic, and respiratory health, respectively. We observed no significant differences in predictive performances between the above-mentioned methods.The most important predictive features were maternal stress, noise, and lifestyle exposures for mental health; proteome (mainly IL1B) and metabolome features for cardiometabolic health; child BMI and urine metabolites for respiratory health. Conclusions Besides their usefulness for epidemiological research, our risk scores show great potential to capture holistic individual level non-hereditary risk associations that can inform practitioners about actionable factors of high-risk children. As in the post-genetic era personalized prevention medicine will focus more and more on modifiable factors, we believe that such integrative approaches will be instrumental in shaping future healthcare paradigms.