BMC Cardiovascular Disorders (Aug 2024)

Machine learning-based model to predict composite thromboembolic events among Chinese elderly patients with atrial fibrillation

  • Jiefeng Ren,
  • Haijun Wang,
  • Song Lai,
  • Yi Shao,
  • Hebin Che,
  • Zaiyao Xue,
  • Xinlian Qi,
  • Sha Zhang,
  • Jinkun Dai,
  • Sai Wang,
  • Kunlian Li,
  • Wei Gan,
  • Quanjin Si

DOI
https://doi.org/10.1186/s12872-024-04082-9
Journal volume & issue
Vol. 24, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Objective Accurate prediction of survival prognosis is helpful to guide clinical decision-making. The aim of this study was to develop a model using machine learning techniques to predict the occurrence of composite thromboembolic events (CTEs) in elderly patients with atrial fibrillation(AF). These events encompass newly diagnosed cerebral ischemia events, cardiovascular events, pulmonary embolism, and lower extremity arterial embolism. Methods This retrospective study included 6,079 elderly hospitalized patients (≥ 75 years old) with AF admitted to the People’s Liberation Army General Hospital in China from January 2010 to June 2022. Random forest imputation was used for handling missing data. In the descriptive statistics section, patients were divided into two groups based on the occurrence of CTEs, and differences between the two groups were analyzed using chi-square tests for categorical variables and rank-sum tests for continuous variables. In the machine learning section, the patients were randomly divided into a training dataset (n = 4,225) and a validation dataset (n = 1,824) in a 7:3 ratio. Four machine learning models (logistic regression, decision tree, random forest, XGBoost) were trained on the training dataset and validated on the validation dataset. Results The incidence of composite thromboembolic events was 19.53%. The Least Absolute Shrinkage and Selection Operator (LASSO) method, using 5-fold cross-validation, was applied to the training dataset and identified a total of 18 features that exhibited a significant association with the occurrence of CTEs. The random forest model outperformed other models in terms of area under the curve (ACC: 0.9144, SEN: 0.7725, SPE: 0.9489, AUC: 0.927, 95% CI: 0.9105–0.9443). The random forest model also showed good clinical validity based on the clinical decision curve. The Shapley Additive exPlanations (SHAP) showed that the top five features associated with the model were history of ischemic stroke, high triglyceride (TG), high total cholesterol (TC), high plasma D-dimer, age. Conclusions This study proposes an accurate model to stratify patients with a high risk of CTEs. The random forest model has good performance. History of ischemic stroke, age, high TG, high TC and high plasma D-Dimer may be correlated with CTEs.

Keywords