ACR Open Rheumatology (Aug 2024)

Classifying Individuals With Rheumatic Conditions as Financially Insecure Using Electronic Health Record Data and Natural Language Processing: Algorithm Derivation and Validation

  • Mia T. Chandler,
  • Tianrun Cai,
  • Leah Santacroce,
  • Sciaska Ulysse,
  • Katherine P. Liao,
  • Candace H. Feldman

DOI
https://doi.org/10.1002/acr2.11675
Journal volume & issue
Vol. 6, no. 8
pp. 481 – 488

Abstract

Read online

Objective We aimed to examine the feasibility of applying natural language processing (NLP) to unstructured electronic health record (EHR) documents to detect the presence of financial insecurity among patients with rheumatologic disease enrolled in an integrated care management program (iCMP). Methods We incorporated supervised, rule‐based NLP and statistical methods to identify financial insecurity among patients with rheumatic conditions enrolled in an iCMP (n = 20,395) in a multihospital EHR system. We constructed a lexicon for financial insecurity using data from available knowledge sources and then reviewed EHR notes from 538 randomly selected individuals (training cohort n = 366, validation cohort n = 172). We manually categorized records as having “definite,” “possible,” or “no” mention of financial insecurity. All available notes were processed using Narrative Information Linear Extraction, a rule‐based version of NLP. Models were trained using the NLP features for financial insecurity using logistic, least absolute shrinkage operator (LASSO), and random forest performance characteristic and were compared with the reference standard. Results A total of 245,142 notes were processed from 538 individual patient records. Financial insecurity was present among 100 (27%) individuals in the training cohort and 63 (37%) in the validation cohort. The LASSO and random forest models performed identically and slightly better than logistic regression, with positive predictive values of 0.90, sensitivities of 0.29, and specificities of 0.98. Conclusion The development of a context‐driven lexicon used with rule‐based NLP to extract data that identify financial insecurity is feasible for use and improved the capture for presence of financial insecurity with high accuracy. In the absence of a standard lexicon and construct definition for financial insecurity status, additional studies are needed to optimize the sensitivity of algorithms to categorize financial insecurity with construct validity.