BMC Psychiatry (Dec 2024)
Using machine learning to predict the probability of incident 2-year depression in older adults with chronic diseases: a retrospective cohort study
Abstract
Abstract Background Older adults with chronic diseases are at higher risk of depressive symptoms than those without. For the onset of depressive symptoms, the prediction ability of changes in common risk factors over a 2-year follow-up period is unclear in the Chinese older population. This study aimed to build risk prediction models (RPMs) to estimate the probability of incident 2-year depression using data from the China Health and Retirement Longitudinal Study (CHARLS). Methods Four ML algorithms (logistic regression [LR], AdaBoost, random forest [RF] and k-nearest neighbor [kNN]) were applied to develop RPMs using the 2011–2015 cohort data. These developed models were then validated with 2018–2020 survey data. We evaluated the model performance using discrimination and calibration metrics, including an area under the receiver operating characteristic curve (AUROC) and the precision-recall curve (AUPRC), accuracy, sensitivity and calibrations plot. Finally, we explored the key factors of depressive symptoms by the selected best predictive models. Results This study finally included 7,121 participants to build models to predict depressive symptoms, finding a 21.5% prevalence of depression. Combining the Synthetic Minority Oversampling Technique (SMOTE) with the logistic regression model (LR-SM) exhibited superior precision to predict depression than other models, with an AUROC and AUPRC of 0.612 and 0.468, respectively, an accuracy of 0.619 and a sensitivity of 0.546. In additiona, external validation of the LR-SM model using data from the 2018–2020 data also demonstrated good predictive ability with an AUROC of 0.623 (95% CI: 0.555– 0.673). Sex, self-rated health status, occupation, eyesight, memory and life satisfaction were identified as impactful predictors of depression. Conclusions Our developed models exhibited high accuracy, good discrimination and calibration profiles in predicting two-year risk of depression among older adults with chronic diseases. This model can be used to identify Chinese older population at high risk of depression and intervene in a timely manner.
Keywords