Analysis of risk factor domains in psychosis patient health records

Eben Holderness; Nicholas Miller; Philip Cawkwell; Kirsten Bolton; Marie Meteer; James Pustejovsky; Mei-Hua Hall

doi:10.1186/s13326-019-0210-8

Journal of Biomedical Semantics (Oct 2019)

Analysis of risk factor domains in psychosis patient health records

Eben Holderness,
Nicholas Miller,
Philip Cawkwell,
Kirsten Bolton,
Marie Meteer,
James Pustejovsky,
Mei-Hua Hall

Affiliations

Eben Holderness: Psychosis Neurobiology Laboratory, McLean Hospital, Harvard Medical School
Nicholas Miller: Psychosis Neurobiology Laboratory, McLean Hospital, Harvard Medical School
Philip Cawkwell: Psychosis Neurobiology Laboratory, McLean Hospital, Harvard Medical School
Kirsten Bolton: Psychosis Neurobiology Laboratory, McLean Hospital, Harvard Medical School
Marie Meteer: Brandeis University Department of Computer Science
James Pustejovsky: Brandeis University Department of Computer Science
Mei-Hua Hall: Psychosis Neurobiology Laboratory, McLean Hospital, Harvard Medical School

DOI: https://doi.org/10.1186/s13326-019-0210-8
Journal volume & issue: Vol. 10, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Background Readmission after discharge from a hospital is disruptive and costly, regardless of the reason. However, it can be particularly problematic for psychiatric patients, so predicting which patients may be readmitted is critically important but also very difficult. Clinical narratives in psychiatric electronic health records (EHRs) span a wide range of topics and vocabulary; therefore, a psychiatric readmission prediction model must begin with a robust and interpretable topic extraction component. Results We designed and evaluated multiple multilayer perceptron and radial basis function neural networks to predict the sentences in a patient’s EHR that are associated with one or more of seven readmission risk factor domains that we identified. In contrast to our baseline cosine similarity model that is based on the methodologies of prior works, our deep learning approaches achieved considerably better F1 scores (0.83 vs 0.66) while also being more scalable and computationally efficient with large volumes of data. Additionally, we found that integrating clinically relevant multiword expressions during preprocessing improves the accuracy of our models and allows for identifying a wider scope of training data in a semi-supervised setting. Conclusion We created a data pipeline for using document vector similarity metrics to perform topic extraction on psychiatric EHR data in service of our long-term goal of creating a readmission risk classifier. We show results for our topic extraction model and identify additional features we will be incorporating in the future.

Published in Journal of Biomedical Semantics

ISSN: 2041-1480 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://jbiomedsem.biomedcentral.com

About the journal

Abstract

Keywords