Heliyon (Sep 2024)

Development and validation of machine learning-based prediction model for severe pneumonia: A multicenter cohort study

  • Zailin Yang,
  • Shuang Chen,
  • Xinyi Tang,
  • Jiao Wang,
  • Ling Liu,
  • Weibo Hu,
  • Yulin Huang,
  • Jian'e Hu,
  • Xiangju Xing,
  • Yakun Zhang,
  • Jun Li,
  • Haike Lei,
  • Yao Liu

Journal volume & issue
Vol. 10, no. 17
p. e37367

Abstract

Read online

Severe pneumonia (SP) is a prevalent respiratory ailment characterized by high mortality and poor prognosis. Current scoring systems for pneumonia are not only time-consuming but also exhibit limitations in early SP prediction. To address this gap, this study aimed to develop a machine-learning model using inflammatory markers from peripheral blood for early prediction of SP. A total of 204 pneumonia patients from seven medical centers were studied, with 143 (68 SP cases) in the training cohort and 61 (32 SP cases) in the test cohort. Clinical characteristics and laboratory test results were collected at diagnosis. Various models including Logistic Regression, Random Forest, Naïve Bayes, XGBoost, Support Vector Machine, and Decision Tree were built and evaluated. Seven predictors—age, sex, WBC count, T-lymphocyte count, NLR, CRP, TNF-α, IL-4/IFN-γ ratio, IL-6/IL-10 ratio—were selected through LASSO regression and clinical insight. The XGBoost model, exhibiting best performance, achieved an AUC of 0.901 (95 % CI: 0.827 to 0.985) in the test cohort, with an accuracy of 0.803, sensitivity of 0.844, specificity of 0.759, and F1_score of 0.818. Indeed, SHAP analysis emphasized the significance of elevated WBC counts, older age, and elevated CRP as the top predictors. The use of inflammatory biomarkers in this concise predictive model shows significant potential for the rapid assessment of SP risk, thereby facilitating timely preventive interventions.

Keywords