Heliyon (Sep 2024)
Development and validation of machine learning-based prediction model for severe pneumonia: A multicenter cohort study
Abstract
Severe pneumonia (SP) is a prevalent respiratory ailment characterized by high mortality and poor prognosis. Current scoring systems for pneumonia are not only time-consuming but also exhibit limitations in early SP prediction. To address this gap, this study aimed to develop a machine-learning model using inflammatory markers from peripheral blood for early prediction of SP. A total of 204 pneumonia patients from seven medical centers were studied, with 143 (68 SP cases) in the training cohort and 61 (32 SP cases) in the test cohort. Clinical characteristics and laboratory test results were collected at diagnosis. Various models including Logistic Regression, Random Forest, Naïve Bayes, XGBoost, Support Vector Machine, and Decision Tree were built and evaluated. Seven predictors—age, sex, WBC count, T-lymphocyte count, NLR, CRP, TNF-α, IL-4/IFN-γ ratio, IL-6/IL-10 ratio—were selected through LASSO regression and clinical insight. The XGBoost model, exhibiting best performance, achieved an AUC of 0.901 (95 % CI: 0.827 to 0.985) in the test cohort, with an accuracy of 0.803, sensitivity of 0.844, specificity of 0.759, and F1_score of 0.818. Indeed, SHAP analysis emphasized the significance of elevated WBC counts, older age, and elevated CRP as the top predictors. The use of inflammatory biomarkers in this concise predictive model shows significant potential for the rapid assessment of SP risk, thereby facilitating timely preventive interventions.