A Simple Free-Text-like Method for Extracting Semi-Structured Data from Electronic Health Records: Exemplified in Prediction of In-Hospital Mortality

Eyal Klang; Matthew A. Levin; Shelly Soffer; Alexis Zebrowski; Benjamin S. Glicksberg; Brendan G. Carr; Jolion Mcgreevy; David L. Reich; Robert Freeman

doi:10.3390/bdcc5030040

Big Data and Cognitive Computing (Aug 2021)

A Simple Free-Text-like Method for Extracting Semi-Structured Data from Electronic Health Records: Exemplified in Prediction of In-Hospital Mortality

Eyal Klang,
Matthew A. Levin,
Shelly Soffer,
Alexis Zebrowski,
Benjamin S. Glicksberg,
Brendan G. Carr,
Jolion Mcgreevy,
David L. Reich,
Robert Freeman

Affiliations

Eyal Klang: Chaim Sheba Medical Center, Department of Diagnostic Imaging, Affiliated to Tel-Aviv University, Tel Aviv-Yafo 52621, Israel
Matthew A. Levin: Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
Shelly Soffer: Internal Medicine B, Assuta Medical Center, Ben-Gurion University of the Negev, Be’er Sheva 7747629, Israel
Alexis Zebrowski: Department of Emergency Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
Benjamin S. Glicksberg: Hasso Plattner Institute for Digital Health at Mount Sinai, New York, NY 10065, USA
Brendan G. Carr: Department of Emergency Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
Jolion Mcgreevy: Department of Emergency Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
David L. Reich: Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
Robert Freeman: Institute for Healthcare Delivery Science, Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA

DOI: https://doi.org/10.3390/bdcc5030040
Journal volume & issue: Vol. 5, no. 3
p. 40

Abstract

Read online

The Epic electronic health record (EHR) is a commonly used EHR in the United States. This EHR contain large semi-structured “flowsheet” fields. Flowsheet fields lack a well-defined data dictionary and are unique to each site. We evaluated a simple free-text-like method to extract these data. As a use case, we demonstrate this method in predicting mortality during emergency department (ED) triage. We retrieved demographic and clinical data for ED visits from the Epic EHR (1/2014–12/2018). Data included structured, semi-structured flowsheet records and free-text notes. The study outcome was in-hospital death within 48 h. Most of the data were coded using a free-text-like Bag-of-Words (BoW) approach. Two machine-learning models were trained: gradient boosting and logistic regression. Term frequency-inverse document frequency was employed in the logistic regression model (LR-tf-idf). An ensemble of LR-tf-idf and gradient boosting was evaluated. Models were trained on years 2014–2017 and tested on year 2018. Among 412,859 visits, the 48-h mortality rate was 0.2%. LR-tf-idf showed AUC 0.98 (95% CI: 0.98–0.99). Gradient boosting showed AUC 0.97 (95% CI: 0.96–0.99). An ensemble of both showed AUC 0.99 (95% CI: 0.98–0.99). In conclusion, a free-text-like approach can be useful for extracting knowledge from large amounts of complex semi-structured EHR data.

Published in Big Data and Cognitive Computing

ISSN: 2504-2289 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology
Website: http://www.mdpi.com/journal/BDCC

About the journal

Abstract

Keywords