BMC Cancer (Apr 2024)

Identification of prognostic signatures in remnant gastric cancer through an interpretable risk model based on machine learning: a multicenter cohort study

  • Zhouwei Zhan,
  • Bijuan Chen,
  • Hui Cheng,
  • Shaohua Xu,
  • Chunping Huang,
  • Sijing Zhou,
  • Haiting Chen,
  • Xuanping Lin,
  • Ruyu Lin,
  • Wanting Huang,
  • Xiaohuan Ma,
  • Yu Fu,
  • Zhipeng Chen,
  • Hanchen Zheng,
  • Songchang Shi,
  • Zengqing Guo,
  • Lihui Zhang

DOI
https://doi.org/10.1186/s12885-024-12303-9
Journal volume & issue
Vol. 24, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Objective The purpose of this study was to develop an individual survival prediction model based on multiple machine learning (ML) algorithms to predict survival probability for remnant gastric cancer (RGC). Methods Clinicopathologic data of 286 patients with RGC undergoing operation (radical resection and palliative resection) from a multi-institution database were enrolled and analyzed retrospectively. These individuals were split into training (80%) and test cohort (20%) by using random allocation. Nine commonly used ML methods were employed to construct survival prediction models. Algorithm performance was estimated by analyzing accuracy, precision, recall, F1-score, area under the receiver operating characteristic curve (AUC), confusion matrices, five-fold cross-validation, decision curve analysis (DCA), and calibration curve. The best model was selected through appropriate verification and validation and was suitably explained by the SHapley Additive exPlanations (SHAP) approach. Results Compared with the traditional methods, the RGC survival prediction models employing ML exhibited good performance. Except for the decision tree model, all other models performed well, with a mean ROC AUC above 0.7. The DCA findings suggest that the developed models have the potential to enhance clinical decision-making processes, thereby improving patient outcomes. The calibration curve reveals that all models except the decision tree model displayed commendable predictive performance. Through CatBoost-based modeling and SHAP analysis, the five-year survival probability is significantly influenced by several factors: the lymph node ratio (LNR), T stage, tumor size, resection margins, perineural invasion, and distant metastasis. Conclusions This study established predictive models for survival probability at five years in RGC patients based on ML algorithms which showed high accuracy and applicative value.

Keywords