Journal of Inflammation Research (Jan 2025)
Comprehensive Sepsis Risk Prediction in Leukemia Using a Random Forest Model and Restricted Cubic Spline Analysis
Abstract
Yanqi Kou,1,2,* Yuan Tian,2,3,* Yanping Ha,2,3,* Shijie Wang,1,* Xiaobai Sun,1 Shuxin Lv,1 Botao Luo,2,3 Yuping Yang,2 Ling Qin1 1Department of Hematology, The First Affiliated Hospital, and College of Clinical Medicine of Henan University of Science and Technology, Luoyang, Henan Province, People’s Republic of China; 2Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, Guangdong Medical University, Zhanjiang City, Guangdong Province, People’s Republic of China; 3Department of Pathology, Guangdong Medical University, Zhanjiang City, Guangdong Province, People’s Republic of China*These authors contributed equally to this workCorrespondence: Yuping Yang, Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, Guangdong Medical University, No. 2 Wenming East Road, Xiashan, Zhanjiang, Guangdong, 524000, People’s Republic of China, Email [email protected] Ling Qin, Department of Hematology, The First Affiliated Hospital and College of Clinical Medicine, Henan University of Science and Technology, 24 Jinghua Road, Jianxi District, Luoyang, Henan Province, 471003, People’s Republic of China, Email [email protected]: Sepsis is a severe complication in leukemia patients, contributing to high mortality rates. Identifying early predictors of sepsis is crucial for timely intervention. This study aimed to develop and validate a predictive model for sepsis risk in leukemia patients using machine learning techniques.Methods: This retrospective study included 4310 leukemia patients admitted to the Affiliated Hospital of Guangdong Medical University from 2005 to 2024, using 70% for training and 30% for validation. Feature selection was performed using univariate logistic regression, LASSO, and the Boruta algorithm, followed by multivariate logistic regression analysis. Seven machine learning models were constructed and evaluated using receiver operating characteristic (ROC) curves and decision curve analysis (DCA). Shapley additive explanations (SHAP) were applied to interpret the results, and restricted cubic spline (RCS) regression explored the nonlinear relationships between variables and sepsis risk. Furthermore, we examined the interactions among predictors to better understand their potential interrelationships.Results: The random forest (RF) model outperformed all others, achieving an AUC of 0.765 in the training cohort and 0.700 in the validation cohort. Key predictors of sepsis identified by SHAP analysis included C-reactive protein (CRP), procalcitonin (PCT), neutrophil count (Neut), lymphocyte count (Lymph), thrombin time (TT), red blood cell count (RBC), total bile acid (TBA), and systolic blood pressure (SBP). RCS analysis revealed significant non-linear associations between CPR, PCT, Neut, Lymph, TT, RBC and SBP with sepsis risk. Pairwise correlation analysis further revealed interactions among these variables.Conclusion: The RF model exhibited robust predictive power for sepsis in leukemia patients, providing clinicians with a valuable tool for early risk assessment and the optimization of treatment strategies.Keywords: leukemia, sepsis, prediction model, biomarkers, machine learning