Explainable machine learning for predicting conversion to neurological disease: Results from 52,939 medical records

Christina Felix; Joshua D Johnston; Kelsey Owen; Emil Shirima; Sidney R Hinds; Kenneth D Mandl; Alex Milinovich; Jay L Alberts

doi:10.1177/20552076241249286

Digital Health (Apr 2024)

Explainable machine learning for predicting conversion to neurological disease: Results from 52,939 medical records

Christina Felix,
Joshua D Johnston,
Kelsey Owen,
Emil Shirima,
Sidney R Hinds,
Kenneth D Mandl,
Alex Milinovich,
Jay L Alberts

Affiliations

Christina Felix: Neurological Institute, , Cleveland, OH, USA
Joshua D Johnston: Department of Biomedical Engineering, , Cleveland, OH, USA
Kelsey Owen: Department of Biomedical Engineering, , Cleveland, OH, USA
Emil Shirima: Neurological Institute, , Cleveland, OH, USA
Sidney R Hinds: Department of Neurology, , Bethesda, MD, USA
Kenneth D Mandl: Computational Health Informatics Program, , Boston, MA, USA
Alex Milinovich: Department of Quantitative Health Sciences, , Cleveland, OH, USA
Jay L Alberts: Department of Biomedical Engineering, , Cleveland, OH, USA

DOI: https://doi.org/10.1177/20552076241249286
Journal volume & issue: Vol. 10

Abstract

Read online

Objective This study assesses the application of interpretable machine learning modeling using electronic medical record data for the prediction of conversion to neurological disease. Methods A retrospective dataset of Cleveland Clinic patients diagnosed with Alzheimer's disease, amyotrophic lateral sclerosis, multiple sclerosis, or Parkinson's disease, and matched controls based on age, sex, race, and ethnicity was compiled. Individualized risk prediction models were created using eXtreme Gradient Boosting for each neurological disease at four timepoints in patient history. The prediction models were assessed for transparency and fairness. Results At timepoints 0-months, 12-months, 24-months, and 60-months prior to diagnosis, Alzheimer’s disease models achieved the area under the receiver operating characteristic curve on a holdout test dataset of 0.794, 0.742, 0.709, and 0.645; amyotrophic lateral sclerosis of 0.883, 0.710, 0.658, and 0.620; multiple sclerosis of 0.922, 0.877, 0.849, and 0.781; and Parkinson’s disease of 0.809, 0.738, 0.700, and 0.651, respectively. Conclusions The results demonstrate that electronic medical records contain latent information that can be used for risk stratification for neurological disorders. In particular, patient-reported outcomes, sleep assessments, falls data, additional disease diagnoses, and longitudinal changes in patient health, such as weight change, are important predictors.

Published in Digital Health

ISSN: 2055-2076 (Online)
Publisher: SAGE Publishing
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://journals.sagepub.com/home/dhj

About the journal