Digital Health (Apr 2024)
Explainable machine learning for predicting conversion to neurological disease: Results from 52,939 medical records
Abstract
Objective This study assesses the application of interpretable machine learning modeling using electronic medical record data for the prediction of conversion to neurological disease. Methods A retrospective dataset of Cleveland Clinic patients diagnosed with Alzheimer's disease, amyotrophic lateral sclerosis, multiple sclerosis, or Parkinson's disease, and matched controls based on age, sex, race, and ethnicity was compiled. Individualized risk prediction models were created using eXtreme Gradient Boosting for each neurological disease at four timepoints in patient history. The prediction models were assessed for transparency and fairness. Results At timepoints 0-months, 12-months, 24-months, and 60-months prior to diagnosis, Alzheimer’s disease models achieved the area under the receiver operating characteristic curve on a holdout test dataset of 0.794, 0.742, 0.709, and 0.645; amyotrophic lateral sclerosis of 0.883, 0.710, 0.658, and 0.620; multiple sclerosis of 0.922, 0.877, 0.849, and 0.781; and Parkinson’s disease of 0.809, 0.738, 0.700, and 0.651, respectively. Conclusions The results demonstrate that electronic medical records contain latent information that can be used for risk stratification for neurological disorders. In particular, patient-reported outcomes, sleep assessments, falls data, additional disease diagnoses, and longitudinal changes in patient health, such as weight change, are important predictors.