Ranking of a wide multidomain set of predictor variables of children obesity by machine learning variable importance techniques

Helena Marcos-Pasero; Gonzalo Colmenarejo; Elena Aguilar-Aguilar; Ana Ramírez de Molina; Guillermo Reglero; Viviana Loria-Kohen

doi:10.1038/s41598-021-81205-8

Scientific Reports (Jan 2021)

Ranking of a wide multidomain set of predictor variables of children obesity by machine learning variable importance techniques

Helena Marcos-Pasero,
Gonzalo Colmenarejo,
Elena Aguilar-Aguilar,
Ana Ramírez de Molina,
Guillermo Reglero,
Viviana Loria-Kohen

Affiliations

Helena Marcos-Pasero: Nutrition and Clinical Trials Unit, GENYAL Platform IMDEA-Food Institute, CEI UAM+CSIC
Gonzalo Colmenarejo: Biostatistics and Bioinformatics Unit, IMDEA-Food Institute, CEI UAM+CSIC
Elena Aguilar-Aguilar: Nutrition and Clinical Trials Unit, GENYAL Platform IMDEA-Food Institute, CEI UAM+CSIC
Ana Ramírez de Molina: Molecular Oncology and Nutritional Genomics of Cancer, IMDEA-Food Institute, CEI UAM+CSIC
Guillermo Reglero: Production and Development of Foods for Health, IMDEA-Food Institute, CEI UAM+CSIC
Viviana Loria-Kohen: Nutrition and Clinical Trials Unit, GENYAL Platform IMDEA-Food Institute, CEI UAM+CSIC

DOI: https://doi.org/10.1038/s41598-021-81205-8
Journal volume & issue: Vol. 11, no. 1
pp. 1 – 14

Abstract

Read online

Abstract The increased prevalence of childhood obesity is expected to translate in the near future into a concomitant soaring of multiple cardio-metabolic diseases. Obesity has a complex, multifactorial etiology, that includes multiple and multidomain potential risk factors: genetics, dietary and physical activity habits, socio-economic environment, lifestyle, etc. In addition, all these factors are expected to exert their influence through a specific and especially convoluted way during childhood, given the fast growth along this period. Machine Learning methods are the appropriate tools to model this complexity, given their ability to cope with high-dimensional, non-linear data. Here, we have analyzed by Machine Learning a sample of 221 children (6–9 years) from Madrid, Spain. Both Random Forest and Gradient Boosting Machine models have been derived to predict the body mass index from a wide set of 190 multidomain variables (including age, sex, genetic polymorphisms, lifestyle, socio-economic, diet, exercise, and gestation ones). A consensus relative importance of the predictors has been estimated through variable importance measures, implemented robustly through an iterative process that included permutation and multiple imputation. We expect this analysis will help to shed light on the most important variables associated to childhood obesity, in order to choose better treatments for its prevention.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal