PLOS Global Public Health (Jan 2024)

Predicting blood lead in uruguayan children: Individual- vs neighborhood-level ensemble learners.

  • Seth Frndak,
  • Elena I Queirolo,
  • Nelly Mañay,
  • Guan Yu,
  • Zia Ahmed,
  • Gabriel Barg,
  • Craig Colder,
  • Katarzyna Kordas

DOI
https://doi.org/10.1371/journal.pgph.0003607
Journal volume & issue
Vol. 4, no. 9
p. e0003607

Abstract

Read online

Predicting childhood blood lead levels (BLLs) has had mixed success, and it is unclear if individual- or neighborhood-level variables are most predictive. An ensemble machine learning (ML) approach to identify the most relevant predictors of BLL ≥2μg/dL in urban children was implemented. A cross-sectional sample of 603 children (~7 years of age) recruited between 2009-2019 from Montevideo, Uruguay participated in the study. 77 individual- and 32 neighborhood-level variables were used to predict BLLs ≥2μg/dL. Three ensemble learners were created: one with individual-level predictors (Ensemble-I), one with neighborhood-level predictors (Ensemble-N), and one with both (Ensemble-All). Each ensemble learner comprised four base classifiers with 50% training, 25% validation, and 25% test datasets. Predictive performance of the three ensemble models was compared using area under the curve (AUC) for the receiver operating characteristic (ROC), precision, sensitivity, and specificity on the test dataset. Ensemble-I (AUC: 0.75, precision: 0.56, sensitivity: 0.79, specificity: 0.65) performed similarly to Ensemble-All (AUC: 0.75, precision: 0.63, sensitivity: 0.79, specificity: 0.69). Ensemble-N (AUC: 0.51, precision: 0.0, sensitivity: 0.0, specificity: 0.50) severely underperformed. Year of enrollment was most important in Ensemble-I and Ensemble-All, followed by household water Pb. Three neighborhood-level variables were among the top 10 important predictors in Ensemble-All (density of bus routes, dwellings with stream/other water source and distance to nearest river). The individual-level only model performed best, although precision was improved when both neighborhood and individual-level variables were included. Future predictive models of lead exposure should consider proximal predictors (i.e., household characteristics).