Di-san junyi daxue xuebao (Feb 2022)

Early prediction model for disease progression of COVID-19 patients based on XGBoost: establishment and evaluation

  • WANG Ming,
  • WANG Ming,
  • CHENG Zhenhao,
  • CHENG Zhenhao,
  • HU Miao,
  • TANG Mingcheng,
  • TANG Mingcheng,
  • XU Fumin,
  • WANG Li,
  • NIAN Yongjian,
  • LIU Kaijun

DOI
https://doi.org/10.16016/j.2097-0927.202107161
Journal volume & issue
Vol. 44, no. 3
pp. 195 – 202

Abstract

Read online

Objective To construct an XGBoost prediction model to predict disease severity of COVID-19 based on clinical characteristics dataset of COVID-19 patients. Methods A total of 347 laboratory-confirmed COVID-19 patients with complete medical information admitted from Feb 10 to April 5, 2020 were screened from the medical record system of Huoshenshan Hospital. Firstly, 21 features with significant differences were screened out as input features for the training model. Bayesian optimization was performed on the constructed XGBoost model to adjust the parameters, and the optimal combination of features was filtered based on feature importance. To further analyze the positive and negative effects of the numerical size of each feature on the prediction results, each feature importance was quantified and attributed by using SHapley Additive exPlanations (SHAP). Finally, the performance of the XGBoost prediction model was evaluated, and the model was compared and discussed with other machine learning methods, including support vector machine (SVM), naïve Bayes (NB), logical regression (LR), and k-nearest neighbors (KNN). Results In this study, 21 features with significant differences between the severe and non-severe groups were selected for training and validation. The optimal subset with 10 features in the k-nearest neighbor model obtained the highest value of area under curve (AUC) among the 4 models in the validation set. XGBoost and support vector machine were better than other machine learning methods in terms of prediction performance (AUC: 0.942 0, and 0.959 4 on the test set, respectively), and the training speed of XGBoost was significantly faster. Conclusion A prediction model based on XGBoost is successfully built to achieve early prediction of disease severity of COVID-19 patients.

Keywords