Healthcare Analytics (Dec 2023)
A hybrid machine learning and natural language processing model for early detection of acute coronary syndrome
Abstract
Acute coronary syndrome (ACS) is a leading cause of mortality and morbidity. Predicting the associated risks of patients with chest pain using electronic health record data can help identify those needing more tailored care. This study proposes the development of a reliable prediction framework to serve as a diagnostic support tool for preventing misdiagnoses among patients with clinical concerns for ACS. Data were collected from an urban, demographically diverse hospital in Detroit, Michigan, for patients presenting to the emergency department (ED) with primary chief complaints of chest pain from January 2017 to August 2020. This study incorporated term frequency-inverse document frequency features from free-text summaries, which contain anecdotal symptom descriptions and are among the first data points provided upon entering the ED. The analysis included 16,096 patients with clinical concerns for ACS and trained three machine learning models, logistic regression, AdaBoost, and linear discriminant analysis, across different data processing stages to predict patients with ACS from non-ACS etiology. The AdaBoost model outperformed the other two models with an accuracy of 94% and an F1-score of 0.943 in predicting ACS on the testing data. This study identified key independent factors from patient demographics, comorbidities, and clinical narrative data that predicted ACS in patients. The prediction framework can serve as a decision-support tool to classify ACS and inform physicians about better ACS risk factors.