Frontiers in Cellular and Infection Microbiology (Apr 2022)

A Comparison of XGBoost, Random Forest, and Nomograph for the Prediction of Disease Severity in Patients With COVID-19 Pneumonia: Implications of Cytokine and Immune Cell Profile

  • Wandong Hong,
  • Xiaoying Zhou,
  • Shengchun Jin,
  • Yajing Lu,
  • Jingyi Pan,
  • Qingyi Lin,
  • Shaopeng Yang,
  • Tingting Xu,
  • Zarrin Basharat,
  • Maddalena Zippi,
  • Sirio Fiorino,
  • Vladislav Tsukanov,
  • Simon Stock,
  • Alfonso Grottesi,
  • Qin Chen,
  • Jingye Pan

DOI
https://doi.org/10.3389/fcimb.2022.819267
Journal volume & issue
Vol. 12

Abstract

Read online

Background and AimsThe aim of this study was to apply machine learning models and a nomogram to differentiate critically ill from non-critically ill COVID-19 pneumonia patients.MethodsClinical symptoms and signs, laboratory parameters, cytokine profile, and immune cellular data of 63 COVID-19 pneumonia patients were retrospectively reviewed. Outcomes were followed up until Mar 12, 2020. A logistic regression function (LR model), Random Forest, and XGBoost models were developed. The performance of these models was measured by area under receiver operating characteristic curve (AUC) analysis.ResultsUnivariate analysis revealed that there was a difference between critically and non-critically ill patients with respect to levels of interleukin-6, interleukin-10, T cells, CD4+ T, and CD8+ T cells. Interleukin-10 with an AUC of 0.86 was most useful predictor of critically ill patients with COVID-19 pneumonia. Ten variables (respiratory rate, neutrophil counts, aspartate transaminase, albumin, serum procalcitonin, D-dimer and B-type natriuretic peptide, CD4+ T cells, interleukin-6 and interleukin-10) were used as candidate predictors for LR model, Random Forest (RF) and XGBoost model application. The coefficients from LR model were utilized to build a nomogram. RF and XGBoost methods suggested that Interleukin-10 and interleukin-6 were the most important variables for severity of illness prediction. The mean AUC for LR, RF, and XGBoost model were 0.91, 0.89, and 0.93 respectively (in two-fold cross-validation). Individualized prediction by XGBoost model was explained by local interpretable model-agnostic explanations (LIME) plot.ConclusionsXGBoost exhibited the highest discriminatory performance for prediction of critically ill patients with COVID-19 pneumonia. It is inferred that the nomogram and visualized interpretation with LIME plot could be useful in the clinical setting. Additionally, interleukin-10 could serve as a useful predictor of critically ill patients with COVID-19 pneumonia.

Keywords