Improving ascertainment of suicidal ideation and suicide attempt with natural language processing

Cosmin A. Bejan; Michael Ripperger; Drew Wilimitis; Ryan Ahmed; JooEun Kang; Katelyn Robinson; Theodore J. Morley; Douglas M. Ruderfer; Colin G. Walsh

doi:10.1038/s41598-022-19358-3

Scientific Reports (Sep 2022)

Improving ascertainment of suicidal ideation and suicide attempt with natural language processing

Cosmin A. Bejan,
Michael Ripperger,
Drew Wilimitis,
Ryan Ahmed,
JooEun Kang,
Katelyn Robinson,
Theodore J. Morley,
Douglas M. Ruderfer,
Colin G. Walsh

Affiliations

Cosmin A. Bejan: Department of Biomedical Informatics, Vanderbilt University Medical Center, Vanderbilt University School of Medicine
Michael Ripperger: Department of Biomedical Informatics, Vanderbilt University Medical Center, Vanderbilt University School of Medicine
Drew Wilimitis: Department of Biomedical Informatics, Vanderbilt University Medical Center, Vanderbilt University School of Medicine
Ryan Ahmed: Department of Medicine, Vanderbilt University Medical Center
JooEun Kang: Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center
Katelyn Robinson: Department of Biomedical Informatics, Vanderbilt University Medical Center, Vanderbilt University School of Medicine
Theodore J. Morley: Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center
Douglas M. Ruderfer: Department of Biomedical Informatics, Vanderbilt University Medical Center, Vanderbilt University School of Medicine
Colin G. Walsh: Department of Biomedical Informatics, Vanderbilt University Medical Center, Vanderbilt University School of Medicine

DOI: https://doi.org/10.1038/s41598-022-19358-3
Journal volume & issue: Vol. 12, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Methods relying on diagnostic codes to identify suicidal ideation and suicide attempt in Electronic Health Records (EHRs) at scale are suboptimal because suicide-related outcomes are heavily under-coded. We propose to improve the ascertainment of suicidal outcomes using natural language processing (NLP). We developed information retrieval methodologies to search over 200 million notes from the Vanderbilt EHR. Suicide query terms were extracted using word2vec. A weakly supervised approach was designed to label cases of suicidal outcomes. The NLP validation of the top 200 retrieved patients showed high performance for suicidal ideation (area under the receiver operator curve [AUROC]: 98.6, 95% confidence interval [CI] 97.1–99.5) and suicide attempt (AUROC: 97.3, 95% CI 95.2–98.7). Case extraction produced the best performance when combining NLP and diagnostic codes and when accounting for negated suicide expressions in notes. Overall, we demonstrated that scalable and accurate NLP methods can be developed to identify suicidal behavior in EHRs to enhance prevention efforts, predictive models, and precision medicine.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal