Heliyon (Sep 2024)

Predicting all-cause mortality and premature death using interpretable machine learning among a middle-aged and elderly Chinese population

  • Qi Yu,
  • Lingzhi Zhang,
  • Qian Ma,
  • Lijuan Da,
  • Jiahui Li,
  • Wenyuan Li

Journal volume & issue
Vol. 10, no. 17
p. e36878

Abstract

Read online

Objective: To develop machine learning-based prediction models for all-cause and premature mortality among the middle-aged and elderly population in China. Method: Adults aged 45 years or older at baseline of 2011 from the China Health and Retirement Longitudinal Study (CHARLS) were included. The stacked ensemble model was built utilizing five selected machine learning algorithms. These models underwent training and testing using the CHARLS 2011–2015 cohort (derivation cohort) and subsequently underwent external validation using the CHARLS 2015–2018 cohort (validation cohort). SHapley Additive exPlanations (SHAP) was introduced to quantify the importance of risk factors and explain machine learning algorithms. Result: In derivation cohort, a total of 10,677 subjects were included, 478 died during the follow-up. The stacked ensemble model demonstrated the highest efficacy in terms of its discrimination capability for predicting all-cause mortality and premature death, with an AUC[95 % CI] of 0.826[0.792–0.859] and 0.773[0.725–0.821], respectively. In validation cohort, the corresponding AUC[95 % CI] were 0.803[0.743–0.864] and 0.791[0.719–0.863], respectively. Risk factors including age, sex, self-reported health, activities of daily living, cognitive function, ever smoker, levels of systolic blood pressure, Cystatin C and low density lipoprotein were strong predictors for both all-cause mortality and premature death. Conclusion: Stacked ensemble models performed well in predicting all-cause and premature death in this Chinese cohort. Interpretable techniques can aid in identifying significant risk factors and non-linear relationships between predictors and mortality.

Keywords