JMIR Cardio (May 2023)

Data Quality Degradation on Prediction Models Generated From Continuous Activity and Heart Rate Monitoring: Exploratory Analysis Using Simulation

  • Jason Hearn,
  • Jef Van den Eynde,
  • Bhargava Chinni,
  • Ari Cedars,
  • Danielle Gottlieb Sen,
  • Shelby Kutty,
  • Cedric Manlhiot

DOI
https://doi.org/10.2196/40524
Journal volume & issue
Vol. 7
p. e40524

Abstract

Read online

BackgroundLimited data accuracy is often cited as a reason for caution in the integration of physiological data obtained from consumer-oriented wearable devices in care management pathways. The effect of decreasing accuracy on predictive models generated from these data has not been previously investigated. ObjectiveThe aim of this study is to simulate the effect of data degradation on the reliability of prediction models generated from those data and thus determine the extent to which lower device accuracy might or might not limit their use in clinical settings. MethodsUsing the Multilevel Monitoring of Activity and Sleep in Healthy People data set, which includes continuous free-living step count and heart rate data from 21 healthy volunteers, we trained a random forest model to predict cardiac competence. Model performance in 75 perturbed data sets with increasing missingness, noisiness, bias, and a combination of all 3 perturbations was compared to model performance for the unperturbed data set. ResultsThe unperturbed data set achieved a mean root mean square error (RMSE) of 0.079 (SD 0.001) in predicting cardiac competence index. For all types of perturbations, RMSE remained stable up to 20%-30% perturbation. Above this level, RMSE started increasing and reached the point at which the model was no longer predictive at 80% for noise, 50% for missingness, and 35% for the combination of all perturbations. Introducing systematic bias in the underlying data had no effect on RMSE. ConclusionsIn this proof-of-concept study, the performance of predictive models for cardiac competence generated from continuously acquired physiological data was relatively stable with declining quality of the source data. As such, lower accuracy of consumer-oriented wearable devices might not be an absolute contraindication for their use in clinical prediction models.