Prediction of incident myocardial infarction using machine learning applied to harmonized electronic health record data

Divneet Mandair; Premanand Tiwari; Steven Simon; Kathryn L. Colborn; Michael A. Rosenberg

doi:10.1186/s12911-020-01268-x

BMC Medical Informatics and Decision Making (Oct 2020)

Prediction of incident myocardial infarction using machine learning applied to harmonized electronic health record data

Divneet Mandair,
Premanand Tiwari,
Steven Simon,
Kathryn L. Colborn,
Michael A. Rosenberg

Affiliations

Divneet Mandair: Division of Internal Medicine, University of Colorado School of Medicine
Premanand Tiwari: Colorado Center for Personalized Medicine, University of Colorado School of Medicine
Steven Simon: Division of Cardiology and Cardiac Electrophysiology, University of Colorado School of Medicine
Kathryn L. Colborn: Department of Surgery, University of Colorado School of Medicine
Michael A. Rosenberg: Division of Internal Medicine, University of Colorado School of Medicine

DOI: https://doi.org/10.1186/s12911-020-01268-x
Journal volume & issue: Vol. 20, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Background With cardiovascular disease increasing, substantial research has focused on the development of prediction tools. We compare deep learning and machine learning models to a baseline logistic regression using only ‘known’ risk factors in predicting incident myocardial infarction (MI) from harmonized EHR data. Methods Large-scale case-control study with outcome of 6-month incident MI, conducted using the top 800, from an initial 52 k procedures, diagnoses, and medications within the UCHealth system, harmonized to the Observational Medical Outcomes Partnership common data model, performed on 2.27 million patients. We compared several over- and under- sampling techniques to address the imbalance in the dataset. We compared regularized logistics regression, random forest, boosted gradient machines, and shallow and deep neural networks. A baseline model for comparison was a logistic regression using a limited set of ‘known’ risk factors for MI. Hyper-parameters were identified using 10-fold cross-validation. Results Twenty thousand Five hundred and ninety-one patients were diagnosed with MI compared with 2.25 million who did not. A deep neural network with random undersampling provided superior classification compared with other methods. However, the benefit of the deep neural network was only moderate, showing an F1 Score of 0.092 and AUC of 0.835, compared to a logistic regression model using only ‘known’ risk factors. Calibration for all models was poor despite adequate discrimination, due to overfitting from low frequency of the event of interest. Conclusions Our study suggests that DNN may not offer substantial benefit when trained on harmonized data, compared to traditional methods using established risk factors for MI.

Published in BMC Medical Informatics and Decision Making

ISSN: 1472-6947 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: http://bmcmedinformdecismak.biomedcentral.com

About the journal

Abstract

Keywords