Scientific Reports (Oct 2024)

Utilizing machine learning to predict participant response to follow-up health surveys in the Millennium Cohort Study

  • Wisam Barkho,
  • Nathan C. Carnes,
  • Claire A. Kolaja,
  • Xin M. Tu,
  • Satbir K. Boparai,
  • Sheila F. Castañeda,
  • Beverly D. Sheppard,
  • Jennifer L. Walstrom,
  • Jennifer N. Belding,
  • Rudolph P. Rull,
  • the Millennium Cohort Study Team

DOI
https://doi.org/10.1038/s41598-024-77563-8
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 12

Abstract

Read online

Abstract The Millennium Cohort Study is a longitudinal study which collects self-reported data from surveys to examine the long-term effects of military service. Participant nonresponse to follow-up surveys presents a potential threat to the validity and generalizability of study findings. In recent years, predictive analytics has emerged as a promising tool to identify predictors of nonresponse. Here, we develop a high-skill classifier using machine learning techniques to predict participant response to follow-up surveys of the Millennium Cohort Study. Six supervised algorithms were employed to predict response to the 2021 follow-up survey. Using latent class analysis (LCA), we classified participants based on historical survey response and compared prediction performance with and without this variable. Feature analysis was subsequently conducted on the best-performing model. Including the LCA variable in the machine learning analysis, all six algorithms performed comparably. Without the LCA variable, random forest outperformed the benchmark regression model, however overall prediction performance decreased. Feature analysis showed the LCA variable as the most important predictor. Our findings highlight the importance of historical response to improve prediction performance of participant response to follow-up surveys. Machine learning algorithms can be especially valuable when historical data are not available. Implementing these methods in longitudinal studies can enhance outreach efforts by strategically targeting participants, ultimately boosting survey response rates and mitigating nonresponse.

Keywords