BMC Medical Informatics and Decision Making (Aug 2024)
A risk prediction model based on machine learning algorithm for parastomal hernia after permanent colostomy
Abstract
Abstract Objective To develop a machine learning-based risk prediction model for postoperative parastomal hernia (PSH) in colorectal cancer patients undergoing permanent colostomy, assisting nurses in identifying high-risk groups and devising preventive care strategies. Methods A case-control study was conducted on 495 colorectal cancer patients who underwent permanent colostomy at the Second Affiliated Hospital of Anhui Medical University from June 2017 to June 2023, with a 1-year follow-up period. Patients were categorized into PSH and non-PSH groups based on PSH occurrence within 1-year post-operation. Data were split into training (70%) and testing (30%) sets. Variable selection was performed using Least Absolute Shrinkage and Selection Operator (LASSO) regression, and binary classification prediction models were established using Logistic Regression (LR), Support Vector Classification (SVC), K Nearest Neighbor (KNN), Random Forest (RF), Light Gradient Boosting Machine (LGBM), and Extreme Gradient Boosting (XgBoost). The binary classification label denoted 1 for PSH occurrence and 0 for no PSH occurrence. Parameters were optimized via 5-fold cross-validation. Model performance was evaluated using Area Under Curve (AUC), specificity, sensitivity, accuracy, positive predictive value, negative predictive value, and F1-score. Clinical utility was evaluated using decision curve analysis (DCA), model explanation was enhanced using shapley additive explanation (SHAP), and model visualization was achieved using a nomogram. Results The incidence of PSH within 1 year was 29.1% (144 patients). Among the models tested, the RF model demonstrated the highest discrimination capability with an AUC of 0.888 (95% CI: 0.881–0.935), along with superior specificity, accuracy, sensitivity, and F1 score. It also showed the highest clinical net benefit on the DCA curve. SHAP analysis identified the top 10 influential variables associated with PSH risk: body mass index (BMI), operation duration, history and status of chronic obstructive pulmonary disease (COPD), prealbumin, tumor node metastasis (TNM) staging, stoma site, thickness of rectus abdominis muscle (TRAM), C-reactive protein CRP, american society of anesthesiologists physical status classification (ASA), and stoma diameter. These insights from SHAP plots illustrated how these factors influence individual PSH outcomes. The nomogram was used for model visualization. Conclusion The Random Forest model demonstrated robust predictive performance and clinical relevance in forecasting colonic PSH. This model aids in early identification of high-risk patients and guides preventive care.
Keywords