Arthritis Research & Therapy (May 2022)

Identification of biomarkers by machine learning classifiers to assist diagnose rheumatoid arthritis-associated interstitial lung disease

  • Yan Qin,
  • Yanlin Wang,
  • Fanxing Meng,
  • Min Feng,
  • Xiangcong Zhao,
  • Chong Gao,
  • Jing Luo

DOI
https://doi.org/10.1186/s13075-022-02800-2
Journal volume & issue
Vol. 24, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Background This study aimed to search for blood biomarkers among the profiles of patients with RA-ILD by using machine learning classifiers and probe correlations between the markers and the characteristics of RA-ILD. Methods A total of 153 RA patients were enrolled, including 75 RA-ILD and 78 RA-non-ILD. Routine laboratory data, the levels of tumor markers and autoantibodies, and clinical manifestations were recorded. Univariate analysis, least absolute shrinkage and selection operator (LASSO), random forest (RF), and partial least square (PLS) were performed, and the receiver operating characteristic (ROC) curves were plotted. Results Univariate analysis showed that, compared to RA-non-ILD, patients with RA-ILD were older (p < 0.001), had higher white blood cell (p = 0.003) and neutrophil counts (p = 0.017), had higher erythrocyte sedimentation rate (p = 0.003) and C-reactive protein (p = 0.003), had higher levels of KL-6 (p < 0.001), D-dimer (p < 0.001), fibrinogen (p < 0.001), fibrinogen degradation products (p < 0.001), lactate dehydrogenase (p < 0.001), hydroxybutyrate dehydrogenase (p < 0.001), carbohydrate antigen (CA) 19–9 (p < 0.001), carcinoembryonic antigen (p = 0.001), and CA242 (p < 0.001), but a significantly lower albumin level (p = 0.003). The areas under the curves (AUCs) of the LASSO, RF, and PLS models attained 0.95 in terms of differentiating patients with RA-ILD from those without. When data from the univariate analysis and the top 10 indicators of the three machine learning models were combined, the most discriminatory markers were age and the KL-6, D-dimer, and CA19-9, with AUCs of 0.814 [95% confidence interval (CI) 0.731–0.880], 0.749 (95% CI 0.660–0.824), 0.749 (95% CI 0.660–0.824), and 0.727 (95% CI 0.637–0.805), respectively. When all four markers were combined, the AUC reached 0.928 (95% CI 0.865–0.968). Notably, neither the KL-6 nor the CA19-9 level correlated with disease activity in RA-ILD group. Conclusions The levels of KL-6, D-dimer, and tumor markers greatly aided RA-ILD identification. Machine learning algorithms combined with traditional biostatistical analysis can diagnose patients with RA-ILD and identify biomarkers potentially associated with the disease.

Keywords