Diagnostics (Oct 2021)

A Machine Learning Application to Predict Early Lung Involvement in Scleroderma: A Feasibility Evaluation

  • Giuseppe Murdaca,
  • Simone Caprioli,
  • Alessandro Tonacci,
  • Lucia Billeci,
  • Monica Greco,
  • Simone Negrini,
  • Giuseppe Cittadini,
  • Patrizia Zentilin,
  • Elvira Ventura Spagnolo,
  • Sebastiano Gangemi

DOI
https://doi.org/10.3390/diagnostics11101880
Journal volume & issue
Vol. 11, no. 10
p. 1880

Abstract

Read online

Introduction: Systemic sclerosis (SSc) is a systemic immune-mediated disease, featuring fibrosis of the skin and organs, and has the greatest mortality among rheumatic diseases. The nervous system involvement has recently been demonstrated, although actual lung involvement is considered the leading cause of death in SSc and, therefore, should be diagnosed early. Pulmonary function tests are not sensitive enough to be used for screening purposes, thus they should be flanked by other clinical examinations; however, this would lead to a risk of overtesting, with considerable costs for the health system and an unnecessary burden for the patients. To this extent, Machine Learning (ML) algorithms could represent a useful add-on to the current clinical practice for diagnostic purposes and could help retrieve the most useful exams to be carried out for diagnostic purposes. Method: Here, we retrospectively collected high resolution computed tomography, pulmonary function tests, esophageal pH impedance tests, esophageal manometry and reflux disease questionnaires of 38 patients with SSc, applying, with R, different supervised ML algorithms, including lasso, ridge, elastic net, classification and regression trees (CART) and random forest to estimate the most important predictors for pulmonary involvement from such data. Results: In terms of performance, the random forest algorithm outperformed the other classifiers, with an estimated root-mean-square error (RMSE) of 0.810. However, this algorithm was seen to be computationally intensive, leaving room for the usefulness of other classifiers when a shorter response time is needed. Conclusions: Despite the notably small sample size, that could have prevented obtaining fully reliable data, the powerful tools available for ML can be useful for predicting early lung involvement in SSc patients. The use of predictors coming from spirometry and pH impedentiometry together might perform optimally for predicting early lung involvement in SSc.

Keywords