Health Promotion and Chronic Disease Prevention in Canada (May 2023)

Using classification and regression trees to model missingness in youth BMI, height and body mass data

  • Amanda Doggett,
  • Ashok Chaurasia,
  • Jean-Philippe Chaput,
  • Scott T. Leatherdale

DOI
https://doi.org/10.24095/hpcdp.43.5.03
Journal volume & issue
Vol. 43, no. 5
pp. 231 – 242

Abstract

Read online

Introduction Research suggests that there is often a high degree of missingness in youth body mass index (BMI) data derived from self-reported measures, which may have a large effect on research findings. The first step in handling missing data is to examine the levels and patterns of missingness. However, previous studies examiningyouth BMI missingness used logistic regression, which is limited in its ability to discern subgroups or identify a hierarchy of importance for variables, aspects that may go a long way in helping understand missing data patterns. Methods This study used sex-stratified classification and regression tree (CART) models to examine missingness in height, body mass and BMI data among 74 501 youth participating in the 2018/19 COMPASS study (a prospective cohort study examining health behaviours among Canadian youth), where 31% of BMI data were missing. Diet, movement, academic, mental health and substance use variables were examined for associations with missingness in height, body mass and BMI. Results CART models indicated that the combination of being younger, having a selfperception of being overweight, being less physically active and having poorer mental health yielded female and male subgroups highly likely to be missing BMI values. Survey respondents who did not perceive themselves as overweight and who were older were unlikely to be missing BMI values. ConclusionThe subgroups identified by the CART models indicate that a sample that deletes cases with missing BMI would be biased towards physically, emotionally and mentally healthier youth. Given the ability of CART models to identify these subgroups and a hierarchy of variable importance, they are an invaluable tool for examining missing data patterns and appropriate handling of missing data.