Frontiers in Cardiovascular Medicine (Feb 2021)
Development and Validation of a Predictive Model for Coronary Artery Disease Using Machine Learning
Abstract
Early identification of coronary artery disease (CAD) can prevent the progress of CAD and effectually lower the mortality rate, so we intended to construct and validate a machine learning model to predict the risk of CAD based on conventional risk factors and lab test data. There were 3,112 CAD patients and 3,182 controls enrolled from three centers in China. We compared the baseline and clinical characteristics between two groups. Then, Random Forest algorithm was used to construct a model to predict CAD and the model was assessed by receiver operating characteristic (ROC) curve. In the development cohort, the Random Forest model showed a good AUC 0.948 (95%CI: 0.941–0.954) to identify CAD patients from controls, with a sensitivity of 90%, a specificity of 85.4%, a positive predictive value of 0.863 and a negative predictive value of 0.894. Validation of the model also yielded a favorable discriminatory ability with the AUC, sensitivity, specificity, positive predictive value, and negative predictive value of 0.944 (95%CI: 0.934–0.955), 89.5%, 85.8%, 0.868, and 0.886 in the validation cohort 1, respectively, and 0.940 (95%CI: 0.922–0.960), 79.5%, 94.3%, 0.932, and 0.823 in the validation cohort 2, respectively. An easy-to-use tool that combined 15 indexes to assess the CAD risk was constructed and validated using Random Forest algorithm, which showed favorable predictive capability (http://45.32.120.149:3000/randomforest). Our model is extremely valuable for clinical practice, which will be helpful for the management and primary prevention of CAD patients.
Keywords