Biomolecules (Dec 2023)
Prediction of Parkinson’s Disease Using Machine Learning Methods
Abstract
The detection of Parkinson’s disease (PD) in its early stages is of great importance for its treatment and management, but consensus is lacking on what information is necessary and what models should be used to best predict PD risk. In our study, we first grouped PD-associated factors based on their cost and accessibility, and then gradually incorporated them into risk predictions, which were built using eight commonly used machine learning models to allow for comprehensive assessment. Finally, the Shapley Additive Explanations (SHAP) method was used to investigate the contributions of each factor. We found that models built with demographic variables, hospital admission examinations, clinical assessment, and polygenic risk score achieved the best prediction performance, and the inclusion of invasive biomarkers could not further enhance its accuracy. Among the eight machine learning models considered, penalized logistic regression and XGBoost were the most accurate algorithms for assessing PD risk, with penalized logistic regression achieving an area under the curve of 0.94 and a Brier score of 0.08. Olfactory function and polygenic risk scores were the most important predictors for PD risk. Our research has offered a practical framework for PD risk assessment, where necessary information and efficient machine learning tools were highlighted.
Keywords