Journal of Medical Internet Research (Jun 2023)
Predicting Disengagement to Better Support Outcomes in a Web-Based Weight Loss Program Using Machine Learning Models: Cross-Sectional Study
Abstract
BackgroundEngagement is key to interventions that achieve successful behavior change and improvements in health. There is limited literature on the application of predictive machine learning (ML) models to data from commercially available weight loss programs to predict disengagement. Such data could help participants achieve their goals. ObjectiveThis study aimed to use explainable ML to predict the risk of member disengagement week by week over 12 weeks on a commercially available web-based weight loss program. MethodsData were available from 59,686 adults who participated in the weight loss program between October 2014 and September 2019. Data included year of birth, sex, height, weight, motivation to join the program, use statistics (eg, weight entries, entries into the food diary, views of the menu, and program content), program type, and weight loss. Random forest, extreme gradient boosting, and logistic regression with L1 regularization models were developed and validated using a 10-fold cross-validation approach. In addition, temporal validation was performed on a test cohort of 16,947 members who participated in the program between April 2018 and September 2019, and the remaining data were used for model development. Shapley values were used to identify globally relevant features and explain individual predictions. ResultsThe average age of the participants was 49.60 (SD 12.54) years, the average starting BMI was 32.43 (SD 6.19), and 81.46% (39,594/48,604) of the participants were female. The class distributions (active and inactive members) changed from 39,369 and 9235 in week 2 to 31,602 and 17,002 in week 12, respectively. With 10-fold-cross-validation, extreme gradient boosting models had the best predictive performance, which ranged from 0.85 (95% CI 0.84-0.85) to 0.93 (95% CI 0.93-0.93) for area under the receiver operating characteristic curve and from 0.57 (95% CI 0.56-0.58) to 0.95 (95% CI 0.95-0.96) for area under the precision-recall curve (across 12 weeks of the program). They also presented a good calibration. Results obtained with temporal validation ranged from 0.51 to 0.95 for area under a precision-recall curve and 0.84 to 0.93 for area under the receiver operating characteristic curve across the 12 weeks. There was a considerable improvement in area under a precision-recall curve of 20% in week 3 of the program. On the basis of the computed Shapley values, the most important features for predicting disengagement in the following week were those related to the total activity on the platform and entering a weight in the previous weeks. ConclusionsThis study showed the potential of applying ML predictive algorithms to help predict and understand participants’ disengagement with a web-based weight loss program. Given the association between engagement and health outcomes, these findings can prove valuable in providing better support to individuals to enhance their engagement and potentially achieve greater weight loss.