Prevalence and clinical characteristics of patients with rheumatoid arthritis with interstitial lung disease using unstructured healthcare data and machine learning
Raul Castellanos-Moreira,
Diego Benavent,
Alejandra López Robles,
Ernesto Trallero-Araguás,
Lucía Silva-Fernández,
Juliana Restrepo,
Jose A Román Ivorra,
Maria Lopez Lasanta,
Laura Cebrián,
Leticia Lojo,
Belén López-Muñíz,
Julia Fernández-Melon,
Belén Núñez,
Raúl Veiga Cabello,
Pilar Ahijado,
Isabel De la Morena Barrio,
Nerea Costas Torrijo,
Belén Safont,
Enrique Ornilla,
Arantxa Campo,
Jose L Andreu,
Elvira Díez,
Elena Bollo,
David Vilanova,
Sara Luján Valdés
Affiliations
Raul Castellanos-Moreira
1Hospital Clínic de Barcelona, Rheumatology, Barcelona, Spain
Diego Benavent
Department of Rheumatology, Bellvitge University Hospital, L`Hospitalet de Llobregat, Spain
Alejandra López Robles
Rheumatology Department, Complejo Asistencial Universitario de Leon, Leon, Spain
Ernesto Trallero-Araguás
Systemic Autoimmune Disease Section, Vall d’Hebron Institute of Research, Barcelona, Spain
Lucía Silva-Fernández
Rheumatology Department, Hospital Universitario Son Espases, Palma, Spain
Juliana Restrepo
Rheumatology Department, Clinica Universidad de Navarra, Pamplona, Spain
Jose A Román Ivorra
Reumathology Department, Hospital Politécnico y Universitario La Fe, Valencia, Spain
Objectives Real-world data regarding rheumatoid arthritis (RA) and its association with interstitial lung disease (ILD) is still scarce. This study aimed to estimate the prevalence of RA and ILD in patients with RA (RAILD) in Spain, and to compare clinical characteristics of patients with RA with and without ILD using natural language processing (NLP) on electronic health records (EHR).Methods Observational case–control, retrospective and multicentre study based on the secondary use of unstructured clinical data from patients with adult RA and RAILD from nine hospitals between 2014 and 2019. NLP was used to extract unstructured clinical information from EHR and standardise it into a SNOMED-CT terminology. Prevalence of RA and RAILD were calculated, and a descriptive analysis was performed. Characteristics between patients with RAILD and RA patients without ILD (RAnonILD) were compared.Results From a source population of 3 176 165 patients and 64 241 683 EHRs, 13 958 patients with RA were identified. Of those, 5.1% patients additionally had ILD (RAILD). The overall age-adjusted prevalence of RA and RAILD were 0.53% and 0.02%, respectively. The most common ILD subtype was usual interstitial pneumonia (29.3%). When comparing RAILD versus RAnonILD patients, RAILD patients were older and had more comorbidities, notably concerning infections (33.6% vs 16.5%, p<0.001), malignancies (15.9% vs 8.5%, p<0.001) and cardiovascular disease (25.8% vs 13.9%, p<0.001) than RAnonILD. RAILD patients also had higher inflammatory burden reflected in more pharmacological prescriptions and higher inflammatory parameters and presented a higher in-hospital mortality with a higher risk of death (HR 2.32; 95% CI 1.59 to 2.81, p<0.001).Conclusions We found an estimated age-adjusted prevalence of RA and RAILD by analysing real-world data through NLP. RAILD patients were more vulnerable at the time of inclusion with higher comorbidity and inflammatory burden than RAnonILD, which correlated with higher mortality.