Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients

Jiang Li; Fátima Rodriguez; Andrew Ward; Ashish Sarraju; David Scheinker; Sukyung Chung

doi:10.1136/openhrt-2021-001802

Open Heart (Dec 2021)

Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients

Jiang Li,
Fátima Rodriguez,
Andrew Ward,
Ashish Sarraju,
David Scheinker,
Sukyung Chung

Affiliations

Jiang Li: GI Department, Peking University First Hospital, Beijing, Beijing, China
Fátima Rodriguez: Division of Cardiovascular Medicine and Cardiovascular Institute, Stanford University School of Medicine, Stanford, California, USA
Andrew Ward: Department of Electrical Engineering, Stanford University, Stanford, California, USA
Ashish Sarraju: Division of Cardiovascular Medicine and Cardiovascular Institute, Stanford University School of Medicine, Stanford, California, USA
David Scheinker: Department of Management Science and Engineering, Stanford University School of Engineering, Stanford, California, USA
Sukyung Chung: Palo Alto Medical Foundation Research Institute, Palo Alto, California, USA

DOI: https://doi.org/10.1136/openhrt-2021-001802
Journal volume & issue: Vol. 8, no. 2

Abstract

Read online

Objectives Identifying high-risk patients is crucial for effective cardiovascular disease (CVD) prevention. It is not known whether electronic health record (EHR)-based machine-learning (ML) models can improve CVD risk stratification compared with a secondary prevention risk score developed from randomised clinical trials (Thrombolysis in Myocardial Infarction Risk Score for Secondary Prevention, TRS 2°P).Methods We identified patients with CVD in a large health system, including atherosclerotic CVD (ASCVD), split into 80% training and 20% test sets. A rich set of EHR patient features was extracted. ML models were trained to estimate 5-year CVD event risk (random forests (RF), gradient-boosted machines (GBM), extreme gradient-boosted models (XGBoost), logistic regression with an L2 penalty and L1 penalty (Lasso)). ML models and TRS 2°P were evaluated by the area under the receiver operating characteristic curve (AUC).Results The cohort included 32 192 patients (median age 74 years, with 46% female, 63% non-Hispanic white and 12% Asian patients and 23 475 patients with ASCVD). There were 4010 events over 5 years of follow-up. ML models demonstrated good overall performance; XGBoost demonstrated AUC 0.70 (95% CI 0.68 to 0.71) in the full CVD cohort and AUC 0.71 (95% CI 0.69 to 0.73) in patients with ASCVD, with comparable performance by GBM, RF and Lasso. TRS 2°P performed poorly in all CVD (AUC 0.51, 95% CI 0.50 to 0.53) and ASCVD (AUC 0.50, 95% CI 0.48 to 0.52) patients. ML identified nontraditional predictive variables including education level and primary care visits.Conclusions In a multiethnic real-world population, EHR-based ML approaches significantly improved CVD risk stratification for secondary prevention.

Published in Open Heart

ISSN: 2053-3624 (Online)
Publisher: BMJ Publishing Group
Country of publisher: United Kingdom
LCC subjects: Medicine: Internal medicine: Specialties of internal medicine: Diseases of the circulatory (Cardiovascular) system
Website: https://openheart.bmj.com

About the journal