Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing

Fernandes, Marta; Sun, Haoqi; Jain, Aayushee; Alabsi, Haitham S; Brenner, Laura N; Ye, Elissa; Ge, Wendong; Collens, Sarah I; Leone, Michael J; Das, Sudeshna; Robbins, Gregory K; Mukerji, Shibani S; Westover, M Brandon

doi:10.2196/25457

JMIR Medical Informatics (Feb 2021)

Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing

Fernandes, Marta,
Sun, Haoqi,
Jain, Aayushee,
Alabsi, Haitham S,
Brenner, Laura N,
Ye, Elissa,
Ge, Wendong,
Collens, Sarah I,
Leone, Michael J,
Das, Sudeshna,
Robbins, Gregory K,
Mukerji, Shibani S,
Westover, M Brandon

Affiliations

Fernandes, Marta
Sun, Haoqi
Jain, Aayushee
Alabsi, Haitham S
Brenner, Laura N
Ye, Elissa
Ge, Wendong
Collens, Sarah I
Leone, Michael J
Das, Sudeshna
Robbins, Gregory K
Mukerji, Shibani S
Westover, M Brandon

DOI: https://doi.org/10.2196/25457
Journal volume & issue: Vol. 9, no. 2
p. e25457

Abstract

Read online

BackgroundMedical notes are a rich source of patient data; however, the nature of unstructured text has largely precluded the use of these data for large retrospective analyses. Transforming clinical text into structured data can enable large-scale research studies with electronic health records (EHR) data. Natural language processing (NLP) can be used for text information retrieval, reducing the need for labor-intensive chart review. Here we present an application of NLP to large-scale analysis of medical records at 2 large hospitals for patients hospitalized with COVID-19. ObjectiveOur study goal was to develop an NLP pipeline to classify the discharge disposition (home, inpatient rehabilitation, skilled nursing inpatient facility [SNIF], and death) of patients hospitalized with COVID-19 based on hospital discharge summary notes. MethodsText mining and feature engineering were applied to unstructured text from hospital discharge summaries. The study included patients with COVID-19 discharged from 2 hospitals in the Boston, Massachusetts area (Massachusetts General Hospital and Brigham and Women’s Hospital) between March 10, 2020, and June 30, 2020. The data were divided into a training set (70%) and hold-out test set (30%). Discharge summaries were represented as bags-of-words consisting of single words (unigrams), bigrams, and trigrams. The number of features was reduced during training by excluding n-grams that occurred in fewer than 10% of discharge summaries, and further reduced using least absolute shrinkage and selection operator (LASSO) regularization while training a multiclass logistic regression model. Model performance was evaluated using the hold-out test set. ResultsThe study cohort included 1737 adult patients (median age 61 [SD 18] years; 55% men; 45% White and 16% Black; 14% nonsurvivors and 61% discharged home). The model selected 179 from a vocabulary of 1056 engineered features, consisting of combinations of unigrams, bigrams, and trigrams. The top features contributing most to the classification by the model (for each outcome) were the following: “appointments specialty,” “home health,” and “home care” (home); “intubate” and “ARDS” (inpatient rehabilitation); “service” (SNIF); “brief assessment” and “covid” (death). The model achieved a micro-average area under the receiver operating characteristic curve value of 0.98 (95% CI 0.97-0.98) and average precision of 0.81 (95% CI 0.75-0.84) in the testing set for prediction of discharge disposition. ConclusionsA supervised learning–based NLP approach is able to classify the discharge disposition of patients hospitalized with COVID-19. This approach has the potential to accelerate and increase the scale of research on patients’ discharge disposition that is possible with EHR data.

Published in JMIR Medical Informatics

ISSN: 2291-9694 (Online)
Publisher: JMIR Publications
Country of publisher: Canada
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://medinform.jmir.org

About the journal