Journal of Clinical Medicine (Nov 2023)

Multivariable Risk Modelling and Survival Analysis with Machine Learning in SARS-CoV-2 Infection

  • Andrea Ciarmiello,
  • Francesca Tutino,
  • Elisabetta Giovannini,
  • Amalia Milano,
  • Matteo Barattini,
  • Nikola Yosifov,
  • Debora Calvi,
  • Maurizo Setti,
  • Massimiliano Sivori,
  • Cinzia Sani,
  • Andrea Bastreri,
  • Raffaele Staffiere,
  • Teseo Stefanini,
  • Stefania Artioli,
  • Giampiero Giovacchini

DOI
https://doi.org/10.3390/jcm12227164
Journal volume & issue
Vol. 12, no. 22
p. 7164

Abstract

Read online

Aim: To evaluate the performance of a machine learning model based on demographic variables, blood tests, pre-existing comorbidities, and computed tomography(CT)-based radiomic features to predict critical outcome in patients with acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Methods: We retrospectively enrolled 694 SARS-CoV-2-positive patients. Clinical and demographic data were extracted from clinical records. Radiomic data were extracted from CT. Patients were randomized to the training (80%, n = 556) or test (20%, n = 138) dataset. The training set was used to define the association between severity of disease and comorbidities, laboratory tests, demographic, and CT-based radiomic variables, and to implement a risk-prediction model. The model was evaluated using the C statistic and Brier scores. The test set was used to assess model prediction performance. Results: Patients who died (n = 157) were predominantly male (66%) over the age of 50 with median (range) C-reactive protein (CRP) = 5 [1, 37] mg/dL, lactate dehydrogenase (LDH) = 494 [141, 3631] U/I, and D-dimer = 6.006 [168, 152.015] ng/mL. Surviving patients (n = 537) had median (range) CRP = 3 [0, 27] mg/dL, LDH = 484 [78, 3.745] U/I, and D-dimer = 1.133 [96, 55.660] ng/mL. The strongest risk factors were D-dimer, age, and cardiovascular disease. The model implemented using the variables identified using the LASSO Cox regression analysis classified 90% of non-survivors as high-risk individuals in the testing dataset. In this sample, the estimated median survival in the high-risk group was 9 days (95% CI; 9–37), while the low-risk group did not reach the median survival of 50% (p < 0.001). Conclusions: A machine learning model based on combined data available on the first days of hospitalization (demographics, CT-radiomics, comorbidities, and blood biomarkers), can identify SARS-CoV-2 patients at risk of serious illness and death.

Keywords