RMD Open (Jul 2024)

Natural language processing to identify and characterize spondyloarthritis in clinical practice

  • Loreto Carmona,
  • Victoria Navarro-Compán,
  • Eugenio De Miguel,
  • Diego Benavent,
  • María Benavent-Núñez,
  • Judith Marin-Corral,
  • Javier Arias-Manjón,
  • Miren Taberna,
  • Ignacio Salcedo,
  • Iago Romero,
  • Sebastian Menke,
  • David Casadevall,
  • Natalia Polo,
  • Guillermo Argüello

DOI
https://doi.org/10.1136/rmdopen-2024-004302
Journal volume & issue
Vol. 10, no. 2

Abstract

Read online

Objective This study aims to use a novel technology based on natural language processing (NLP) to extract clinical information from electronic health records (EHRs) to characterise the clinical profile of patients diagnosed with spondyloarthritis (SpA) at a large-scale hospital.Methods An observational, retrospective analysis was conducted on EHR data from all patients with SpA (including psoriatic arthritis (PsA)) at Hospital Universitario La Paz, between 2020 and 2022. Data were collected using Savana Manager, an NLP-based system, enabling the extraction of information from unstructured, free-text EHRs. Variables analysed included demographic data, SpA subtypes, comorbidities and treatments. The performance of the technology in detecting SpA clinical entities was evaluated through precision, recall and F-1 score metrics.Results From a hospital population of 639 474 patients, 4337 (0.7%) patients had a diagnosis of SpA or their subtypes in their EHR. The population predominantly comprised men (55.3%) with a mean age of 50.9 years. Peripheral SpA (including PsA) was reported in 31.6%, axial SpA in 20.9%, both axial and peripheral SpA in 3.7%, while 43.7% of patients did not have the SpA subtype reported. Common comorbidities included hypertension (25.0%), dyslipidaemia (22.2%) and diabetes mellitus (15.5%). The use of conventional disease-modifying antirheumatic drugs (csDMARDs) and biological DMARDs (bDMARDs) was documented, with methotrexate (25.3% of patients) being the most used csDMARDs and adalimumab (10.6% of patients) the most used bDMARD. The NLP technology demonstrated high precision and recall, with all the assessed F-1 score values over 0.80, indicating reliable data extraction.Conclusion The application of NLP technology facilitated the characterisation of the SpA patient profile, including demographics, clinical features, comorbidities and treatments. This study supports the utility of NLP in enhancing the understanding of SpA and suggests its potential for improving patient management by extracting meaningful information from unstructured EHR data.