Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing

Sheng-Feng Sung; Sheng-Feng Sung; Kuan-Lin Sung; Ru-Chiou Pan; Pei-Ju Lee; Ya-Han Hu

doi:10.3389/fcvm.2022.941237

Frontiers in Cardiovascular Medicine (Jul 2022)

Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing

Sheng-Feng Sung,
Sheng-Feng Sung,
Kuan-Lin Sung,
Ru-Chiou Pan,
Pei-Ju Lee,
Ya-Han Hu

Affiliations

Sheng-Feng Sung: Division of Neurology, Department of Internal Medicine, Ditmanson Medical Foundation Chiayi Christian Hospital, Chiayi City, Taiwan
Sheng-Feng Sung: Department of Nursing, Min-Hwei Junior College of Health Care Management, Tainan, Taiwan
Kuan-Lin Sung: School of Medicine, National Taiwan University, Taipei, Taiwan
Ru-Chiou Pan: Clinical Data Center, Department of Medical Research, Ditmanson Medical Foundation Chiayi Christian Hospital, Chiayi City, Taiwan
Pei-Ju Lee: Department of Information Management and Institute of Healthcare Information Management, National Chung Cheng University, Chiayi County, Taiwan
Ya-Han Hu: Department of Information Management, National Central University, Taoyuan, Taiwan

DOI: https://doi.org/10.3389/fcvm.2022.941237
Journal volume & issue: Vol. 9

Abstract

Read online

BackgroundTimely detection of atrial fibrillation (AF) after stroke is highly clinically relevant, aiding decisions on the optimal strategies for secondary prevention of stroke. In the context of limited medical resources, it is crucial to set the right priorities of extended heart rhythm monitoring by stratifying patients into different risk groups likely to have newly detected AF (NDAF). This study aimed to develop an electronic health record (EHR)-based machine learning model to assess the risk of NDAF in an early stage after stroke.MethodsLinked data between a hospital stroke registry and a deidentified research-based database including EHRs and administrative claims data was used. Demographic features, physiological measurements, routine laboratory results, and clinical free text were extracted from EHRs. The extreme gradient boosting algorithm was used to build the prediction model. The prediction performance was evaluated by the C-index and was compared to that of the AS5F and CHASE-LESS scores.ResultsThe study population consisted of a training set of 4,064 and a temporal test set of 1,492 patients. During a median follow-up of 10.2 months, the incidence rate of NDAF was 87.0 per 1,000 person-year in the test set. On the test set, the model based on both structured and unstructured data achieved a C-index of 0.840, which was significantly higher than those of the AS5F (0.779, p = 0.023) and CHASE-LESS (0.768, p = 0.005) scores.ConclusionsIt is feasible to build a machine learning model to assess the risk of NDAF based on EHR data available at the time of hospital admission. Inclusion of information derived from clinical free text can significantly improve the model performance and may outperform risk scores developed using traditional statistical methods. Further studies are needed to assess the clinical usefulness of the prediction model.

Published in Frontiers in Cardiovascular Medicine

ISSN: 2297-055X (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Medicine: Internal medicine: Specialties of internal medicine: Diseases of the circulatory (Cardiovascular) system
Website: https://www.frontiersin.org/journals/cardiovascular-medicine

About the journal

Abstract

Keywords