Digital Health (Apr 2024)

Explainable machine learning for predicting conversion to neurological disease: Results from 52,939 medical records

  • Christina Felix,
  • Joshua D Johnston,
  • Kelsey Owen,
  • Emil Shirima,
  • Sidney R Hinds,
  • Kenneth D Mandl,
  • Alex Milinovich,
  • Jay L Alberts

DOI
https://doi.org/10.1177/20552076241249286
Journal volume & issue
Vol. 10

Abstract

Read online

Objective This study assesses the application of interpretable machine learning modeling using electronic medical record data for the prediction of conversion to neurological disease. Methods A retrospective dataset of Cleveland Clinic patients diagnosed with Alzheimer's disease, amyotrophic lateral sclerosis, multiple sclerosis, or Parkinson's disease, and matched controls based on age, sex, race, and ethnicity was compiled. Individualized risk prediction models were created using eXtreme Gradient Boosting for each neurological disease at four timepoints in patient history. The prediction models were assessed for transparency and fairness. Results At timepoints 0-months, 12-months, 24-months, and 60-months prior to diagnosis, Alzheimer’s disease models achieved the area under the receiver operating characteristic curve on a holdout test dataset of 0.794, 0.742, 0.709, and 0.645; amyotrophic lateral sclerosis of 0.883, 0.710, 0.658, and 0.620; multiple sclerosis of 0.922, 0.877, 0.849, and 0.781; and Parkinson’s disease of 0.809, 0.738, 0.700, and 0.651, respectively. Conclusions The results demonstrate that electronic medical records contain latent information that can be used for risk stratification for neurological disorders. In particular, patient-reported outcomes, sleep assessments, falls data, additional disease diagnoses, and longitudinal changes in patient health, such as weight change, are important predictors.