Journal of Big Data (Jul 2024)

An ensemble machine learning model for predicting one-year mortality in elderly coronary heart disease patients with anemia

  • Longcan Cheng,
  • Yan Nie,
  • Hongxia Wen,
  • Yan Li,
  • Yali Zhao,
  • Qian Zhang,
  • Mingxing Lei,
  • Shihui Fu

DOI
https://doi.org/10.1186/s40537-024-00966-x
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 20

Abstract

Read online

Abstract Objective This study was designed to develop and validate a robust predictive model for one-year mortality in elderly coronary heart disease (CHD) patients with anemia using machine learning methods. Methods Demographics, tests, comorbidities, and drugs were collected for a cohort of 974 elderly patients with CHD. A prospective analysis was performed to evaluate predictive performances of the developed models. External validation of models was performed in a series of 112 elderly CHD patients with anemia. Results The overall one-year mortality was 43.6%. Risk factors included heart rate, chronic heart failure, tachycardia and β receptor blockers. Protective factors included hemoglobin, albumin, high density lipoprotein cholesterol, estimated glomerular filtration rate (eGFR), left ventricular ejection fraction (LVEF), aspirin, clopidogrel, calcium channel blockers, angiotensin converting enzyme inhibitors (ACEIs)/angiotensin receptor blockers (ARBs), and statins. Compared with other algorithms, an ensemble machine learning model performed the best with area under the curve (95% confidence interval) being 0.828 (0.805–0.870) and Brier score being 0.170. Calibration and density curves further confirmed favorable predicted probability and discriminative ability of an ensemble machine learning model. External validation of Ensemble Model also exhibited good performance with area under the curve (95% confidence interval) being 0.825 (0.734–0.916) and Brier score being 0.185. Patients in the high-risk group had more than six-fold probability of one-year mortality compared with those in the low-risk group (P < 0.001). Shaley Additive exPlanation identified the top five risk factors that associated with one-year mortality were hemoglobin, albumin, eGFR, LVEF, and ACEIs/ARBs. Conclusions This model identifies key risk factors and protective factors, providing valuable insights for improving risk assessment, informing clinical decision-making and performing targeted interventions. It outperforms other algorithms with predictive performance and provides significant opportunities for personalized risk mitigation strategies, with clinical implications for improving patient care.

Keywords