A Machine Learning Model Based on Genetic and Traditional Cardiovascular Risk Factors to Predict Premature Coronary Artery Disease

Benrong Liu; Lei Fang; Yujuan Xiong; Qiqi Du; Yang Xiang; Xiaohui Chen; Chao-Wei Tian; Shi-Ming Liu

doi:10.31083/j.fbl2707211

Frontiers in Bioscience-Landmark (Jul 2022)

A Machine Learning Model Based on Genetic and Traditional Cardiovascular Risk Factors to Predict Premature Coronary Artery Disease

Benrong Liu,
Lei Fang,
Yujuan Xiong,
Qiqi Du,
Yang Xiang,
Xiaohui Chen,
Chao-Wei Tian,
Shi-Ming Liu

Affiliations

Benrong Liu: Guangzhou Institute of Cardiovascular Disease, Guangdong Key Laboratory of Vascular Diseases, State Key Laboratory of Respiratory Disease, The Second Affiliated Hospital, Guangzhou Medical University, 510260 Guangzhou, Guangdong, China
Lei Fang: Guangzhou Institute of Cardiovascular Disease, Guangdong Key Laboratory of Vascular Diseases, State Key Laboratory of Respiratory Disease, The Second Affiliated Hospital, Guangzhou Medical University, 510260 Guangzhou, Guangdong, China
Yujuan Xiong: Department of Laboratory Medicine, Panyu Hospital of Chinese Medicine, Guangzhou University of Chinese Medicine, 511400 Guangzhou, Guangdong, China
Qiqi Du: Department of Emergency, The Second Affiliated Hospital, Guangzhou Medical University, 510260 Guangzhou, Guangdong, China
Yang Xiang: General Practice, Guangzhou Medical University, 510182 Guangzhou, Guangdong, China
Xiaohui Chen: Department of Emergency, The Second Affiliated Hospital, Guangzhou Medical University, 510260 Guangzhou, Guangdong, China
Chao-Wei Tian: Guangzhou Institute of Cardiovascular Disease, Guangdong Key Laboratory of Vascular Diseases, State Key Laboratory of Respiratory Disease, The Second Affiliated Hospital, Guangzhou Medical University, 510260 Guangzhou, Guangdong, China
Shi-Ming Liu: Guangzhou Institute of Cardiovascular Disease, Guangdong Key Laboratory of Vascular Diseases, State Key Laboratory of Respiratory Disease, The Second Affiliated Hospital, Guangzhou Medical University, 510260 Guangzhou, Guangdong, China

DOI: https://doi.org/10.31083/j.fbl2707211
Journal volume & issue: Vol. 27, no. 7
p. 211

Abstract

Read online

Background: Premature coronary artery disease (PCAD) has a poor prognosis and a high mortality and disability rate. Accurate prediction of the risk of PCAD is very important for the prevention and early diagnosis of this disease. Machine learning (ML) has been proven a reliable method used for disease diagnosis and for building risk prediction models based on complex factors. The aim of the present study was to develop an accurate prediction model of PCAD risk that allows early intervention. Methods: We performed retrospective analysis of single nucleotide polymorphisms (SNPs) and traditional cardiovascular risk factors (TCRFs) for 131 PCAD patients and 187 controls. The data was used to construct classifiers for the prediction of PCAD risk with the machine learning (ML) algorithms LogisticRegression (LRC), RandomForestClassifier (RFC) and GradientBoostingClassifier (GBC) in scikit-learn. Three quarters of the participants were randomly grouped into a training dataset and the rest into a test dataset. The performance of classifiers was evaluated using area under the receiver operating characteristic curve (AUC), sensitivity and concordance index. R packages were used to construct nomograms. Results: Three optimized feature combinations (FCs) were identified: RS-DT-FC1 (rs2259816, rs1378577, rs10757274, rs4961, smoking, hyperlipidemia, glucose, triglycerides), RS-DT-FC2 (rs1378577, rs10757274, smoking, diabetes, hyperlipidemia, glucose, triglycerides) and RS-DT-FC3 (rs1169313, rs5082, rs9340799, rs10757274, rs1152002, smoking, hyperlipidemia, high-density lipoprotein cholesterol). These were able to build the classifiers with an AUC >0.90 and sensitivity >0.90. The nomograms built with RS-DT-FC1, RS-DT-FC2 and RS-DT-FC3 had a concordance index of 0.94, 0.94 and 0.90, respectively, when validated with the test dataset, and 0.79, 0.82 and 0.79 when validated with the training dataset. Manual prediction of the test data with the three nomograms resulted in an AUC of 0.89, 0.92 and 0.83, respectively, and a sensitivity of 0.92, 0.96 and 0.86, respectively. Conclusions: The selection of suitable features determines the performance of ML models. RS-DT-FC2 may be a suitable FC for building a high-performance prediction model of PCAD with good sensitivity and accuracy. The nomograms allow practical scoring and interpretation of each predictor and may be useful for clinicians in determining the risk of PCAD.

Published in Frontiers in Bioscience-Landmark

ISSN: 2768-6701 (Print); 2768-6698 (Online)
Publisher: IMR Press
Country of publisher: Singapore
LCC subjects: Science: Chemistry: Organic chemistry: Biochemistry; Science: Biology (General)
Website: https://www.imrpress.com/journal/FBL

About the journal

Abstract

Keywords