Reviews in Cardiovascular Medicine (Jun 2023)

A Machine Learning Framework for Diagnosing and Predicting the Severity of Coronary Artery Disease

  • Aikeliyaer Ainiwaer,
  • Wen Qing Hou,
  • Kaisaierjiang Kadier,
  • Rena Rehemuding,
  • Peng Fei Liu,
  • Halimulati Maimaiti,
  • Lian Qin,
  • Xiang Ma,
  • Jian Guo Dai

DOI
https://doi.org/10.31083/j.rcm2406168
Journal volume & issue
Vol. 24, no. 6
p. 168

Abstract

Read online

Background: Although machine learning (ML)-based prediction of coronary artery disease (CAD) has gained increasing attention, assessment of the severity of suspected CAD in symptomatic patients remains challenging. Methods: The training set for this study consisted of 284 retrospective participants, while the test set included 116 prospectively enrolled participants from whom we collected 53 baseline variables and coronary angiography results. The data was pre-processed with outlier processing and One-Hot coding. In the first stage, we constructed a ML model that used baseline information to predict the presence of CAD with a dichotomous model. In the second stage, baseline information was used to construct ML regression models for predicting the severity of CAD. The non-CAD population was included, and two different scores were used as output variables. Finally, statistical analysis and SHAP plot visualization methods were employed to explore the relationship between baseline information and CAD. Results: The study included 269 CAD patients and 131 healthy controls. The eXtreme Gradient Boosting (XGBoost) model exhibited the best performance amongst the different models for predicting CAD, with an area under the receiver operating characteristic curve of 0.728 (95% CI 0.623–0.824). The main correlates were left ventricular ejection fraction, homocysteine, and hemoglobin (p < 0.001). The XGBoost model performed best for predicting the SYNTAX score, with the main correlates being brain natriuretic peptide (BNP), left ventricular ejection fraction, and glycated hemoglobin (p < 0.001). The main relevant features in the model predictive for the GENSINI score were BNP, high density lipoprotein, and homocysteine (p < 0.001). Conclusions: This data-driven approach provides a foundation for the risk stratification and severity assessment of CAD. Clinical Trial Registration: The study was registered in www.clinicaltrials.gov protocol registration system (number NCT05018715).

Keywords