ACR Open Rheumatology (Dec 2022)

Machine Learning Applied to Patient‐Reported Outcomes to Classify Physician‐Derived Measures of Rheumatoid Arthritis Disease Activity

  • Jeffrey R. Curtis,
  • Yujie Su,
  • Shawn Black,
  • Stephen Xu,
  • Wayne Langholff,
  • Clifton O. Bingham,
  • Shelly Kafka,
  • Fenglong Xie

DOI
https://doi.org/10.1002/acr2.11499
Journal volume & issue
Vol. 4, no. 12
pp. 995 – 1003

Abstract

Read online

Objective Patient‐reported outcome (PRO) data have assumed increasing importance in the care of patients with rheumatoid arthritis (RA), yet physician‐derived disease activity measures, such as Clinical Disease Activity Index (CDAI), remain the most accepted metrics to assess disease activity. The possibility that newer longitudinal PRO data might be used as a proxy for the CDAI has not been evaluated. Methods Using data from a large pragmatic trial, we evaluated patients with RA initiating golimumab intravenous or infliximab. The classification target was low disease activity (LDA) (CDAI ≤10) at the first visit between months 3 and 12. Data were randomly partitioned into training (80%) and test (20%) data sets. Multiple machine learning (ML) methods (eg, random forests, gradient boosting, support vector machines) were used to classify CDAI disease activity category, conduct feature selection, and assess feature importance. Model performance evaluated cross‐validated error, comparing different ML approaches using both training and test data. Results A total of 494 patients were analyzed, and 36.4% achieved LDA. The most important classification features included several Patient‐Reported Outcomes Measurement Information System measures (social participation, pain interference, pain intensity, and physical function), patient global, and baseline CDAI. Among all ML methods, random forests performed best. Overall model accuracy and positive predictive values for all ML methods were approximately 80%. Conclusion ML methods coupled with longitudinal PRO data appear useful and can achieve reasonable accuracy in classifying LDA among patients starting a new biologic. This approach has promise for real‐world evidence generation in the common circumstance when physician‐derived disease activity data are not available yet PRO measures are.