Journal of Medical Internet Research (Jan 2023)

Can a Single Variable Predict Early Dropout From Digital Health Interventions? Comparison of Predictive Models From Two Large Randomized Trials

  • Jonathan Bricker,
  • Zhen Miao,
  • Kristin Mull,
  • Margarita Santiago-Torres,
  • David M Vock

DOI
https://doi.org/10.2196/43629
Journal volume & issue
Vol. 25
p. e43629

Abstract

Read online

BackgroundA single generalizable metric that accurately predicts early dropout from digital health interventions has the potential to readily inform intervention targets and treatment augmentations that could boost retention and intervention outcomes. We recently identified a type of early dropout from digital health interventions for smoking cessation, specifically, users who logged in during the first week of the intervention and had little to no activity thereafter. These users also had a substantially lower smoking cessation rate with our iCanQuit smoking cessation app compared with users who used the app for longer periods. ObjectiveThis study aimed to explore whether log-in count data, using standard statistical methods, can precisely predict whether an individual will become an iCanQuit early dropout while validating the approach using other statistical methods and randomized trial data from 3 other digital interventions for smoking cessation (combined randomized N=4529). MethodsStandard logistic regression models were used to predict early dropouts for individuals receiving the iCanQuit smoking cessation intervention app, the National Cancer Institute QuitGuide smoking cessation intervention app, the WebQuit.org smoking cessation intervention website, and the Smokefree.gov smoking cessation intervention website. The main predictors were the number of times a participant logged in per day during the first 7 days following randomization. The area under the curve (AUC) assessed the performance of the logistic regression models, which were compared with decision trees, support vector machine, and neural network models. We also examined whether 13 baseline variables that included a variety of demographics (eg, race and ethnicity, gender, and age) and smoking characteristics (eg, use of e-cigarettes and confidence in being smoke free) might improve this prediction. ResultsThe AUC for each logistic regression model using only the first 7 days of log-in count variables was 0.94 (95% CI 0.90-0.97) for iCanQuit, 0.88 (95% CI 0.83-0.93) for QuitGuide, 0.85 (95% CI 0.80-0.88) for WebQuit.org, and 0.60 (95% CI 0.54-0.66) for Smokefree.gov. Replacing logistic regression models with more complex decision trees, support vector machines, or neural network models did not significantly increase the AUC, nor did including additional baseline variables as predictors. The sensitivity and specificity were generally good, and they were excellent for iCanQuit (ie, 0.91 and 0.85, respectively, at the 0.5 classification threshold). ConclusionsLogistic regression models using only the first 7 days of log-in count data were generally good at predicting early dropouts. These models performed well when using simple, automated, and readily available log-in count data, whereas including self-reported baseline variables did not improve the prediction. The results will inform the early identification of people at risk of early dropout from digital health interventions with the goal of intervening further by providing them with augmented treatments to increase their retention and, ultimately, their intervention outcomes.