Mayo Clinic Proceedings: Digital Health (Dec 2023)

Building a Natural Language Processing Artificial Intelligence to Predict Suicide-Related Events Based on Patient Portal Message Data

  • Archis R. Bhandarkar, MD, MS,
  • Namrata Arya, BS,
  • Keldon K. Lin, BA,
  • Frederick North, MD,
  • Michelle J. Duvall, MD,
  • Nathaniel E. Miller, MD,
  • Jennifer L. Pecina, MD

Journal volume & issue
Vol. 1, no. 4
pp. 510 – 518

Abstract

Read online

Objective: To develop a natural language processing artificial intelligence model trained on text from patient portal messages to predict 30-day suicide-related events (SRE). Patients and Methods: Patient portal messages sent by patients between January 1, 2013, and October 31, 2017 were screened for an associated SRE within 30 days. For both patient portal messages associated with a 30-day SRE and a randomized control set, we automatically extracted several features: (1) frequencies of keywords; (2) message metadata; and (3) message sentiment. Results: A total of 840 patient portal messages were included in our final analysis, including 420 messages with and without an associated 30-day SRE. Patient messages with an associated 30-day SRE had a mean sentiment score that was less than those without an SRE (P<.001). Messages with an associated 30-day SRE had greater word counts (P=.002) and more use of ellipses (P=.02), but less use of exclamation marks (P=.04) and question marks (P=.007) compared with messages without a 30-day SRE. The neural network machine learning model had the highest area under the receiver operating curve at 0.710, with a sensitivity of 56.0% and a specificity of 69.0%. Conclusion: A natural language processing artificial intelligence model trained on a subset of patient portal message data was able to predict 30-day SRE at a level comparable to commonly used suicide assessment tools. Predictors that conveyed the overall tone of a patient message, such as the sentiment score, were more highly weighted by machine learning models in predicting 30-day SRE than the frequencies of individual words.