Cerebral Circulation - Cognition and Behavior (Jan 2024)
Predicting Incident Dementia in Cerebral Small Vessel Disease: Comparison of Machine Learning and Traditional Statistical Models
Abstract
Introduction: Cerebral small vessel disease (SVD) contributes to 45% of dementia cases worldwide. Only a minority of SVD patients develop dementia, yet we lack a reliable model for predicting incident dementia in SVD. Most attempts to date have relied on traditional statistical approaches, whereas machine learning (ML) methods are increasingly used for clinical prediction in other settings. Methods: We investigated whether ML methods improved prediction of incident dementia in SVD over traditional statistical. We included three cohorts with varying SVD severity (RUN DMC, n=503; SCANS, n=121; HARMONISATION, n=265). Baseline demographics, vascular risk factors, cognitive scores, and MRI features of SVD were used for prediction. We conducted both survival analysis and classification analysis predicting 3-year dementia risk. For each analysis, several ML methods were evaluated against standard Cox or logistic regression. Finally, we compared the feature importance ranking by different models. Results: We included 789 participants without missing data in the survival analysis, among whom 108 (13.7%) developed dementia during a median (IQR) follow-up period of 5.4 (4.1, 8.7) years. After excluding those censored before three years, we included 750 participants in the classification analysis, among whom 48 (6.4%) developed dementia by year 3. Comparing statistical and ML models, only the regularised Cox/logistic regression models outperformed their statistical counterparts overall, but not significantly so in survival analysis. Baseline cognitive scores were highly predictive, and all methods ranked global cognition as the most important feature. Discussion: ML survival or classification models brought little improvement over traditional statistical approaches in predicting incident dementia in SVD. ML approaches may be better suited to prediction problems using a larger number of input variables.