Scientific Reports (Aug 2024)

A better performing algorithm for identification of implausible growth data from longitudinal pediatric medical records

  • Kylie K. Harrall,
  • Sarah M. Bird,
  • Keith E. Muller,
  • Lauren A. Vanderlinden,
  • Maya E. Payton,
  • Anna Bellatorre,
  • Dana Dabelea,
  • Deborah H. Glueck

DOI
https://doi.org/10.1038/s41598-024-69161-5
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Tracking trajectories of body size in children provides insight into chronic disease risk. One measure of pediatric body size is body mass index (BMI), a function of height and weight. Errors in measuring height or weight may lead to incorrect assessment of BMI. Yet childhood measures of height and weight extracted from electronic medical records often include values which seem biologically implausible in the context of a growth trajectory. Removing biologically implausible values reduces noise in the data, and thus increases the ease of modeling associations between exposures and childhood BMI trajectories, or between childhood BMI trajectories and subsequent health conditions. We developed open-source algorithms (available on github) for detecting and removing biologically implausible values in pediatric trajectories of height and weight. A Monte Carlo simulation experiment compared the sensitivity, specificity and speed of our algorithms to three published algorithms. The comparator algorithms were selected because they used trajectory information, had open-source code, and had published verification studies. Simulation inputs were derived from longitudinal epidemiological cohorts. Our algorithms had higher specificity, with similar sensitivity and speed, when compared to the three published algorithms. The results suggest that our algorithms should be adopted for cleaning longitudinal pediatric growth data.