Cancer Medicine (Jun 2024)

Explainable machine learning predicts survival of retroperitoneal liposarcoma: A study based on the SEER database and external validation in China

  • Maoyu Wang,
  • Zhizhou Li,
  • Shuxiong Zeng,
  • Ziwei Wang,
  • Yidie Ying,
  • Wei He,
  • Zhensheng Zhang,
  • Huiqing Wang,
  • Chuanliang Xu

DOI
https://doi.org/10.1002/cam4.7324
Journal volume & issue
Vol. 13, no. 11
pp. n/a – n/a

Abstract

Read online

Abstract Objective We have developed explainable machine learning models to predict the overall survival (OS) of retroperitoneal liposarcoma (RLPS) patients. This approach aims to enhance the explainability and transparency of our modeling results. Methods We collected clinicopathological information of RLPS patients from The Surveillance, Epidemiology, and End Results (SEER) database and allocated them into training and validation sets with a 7:3 ratio. Simultaneously, we obtained an external validation cohort from The First Affiliated Hospital of Naval Medical University (Shanghai, China). We performed LASSO regression and multivariate Cox proportional hazards analysis to identify relevant risk factors, which were then combined to develop six machine learning (ML) models: Cox proportional hazards model (Coxph), random survival forest (RSF), ranger, gradient boosting with component‐wise linear models (GBM), decision trees, and boosting trees. The predictive performance of these ML models was evaluated using the concordance index (C‐index), the integrated cumulative/dynamic area under the curve (AUC), and the integrated Brier score, as well as the Cox–Snell residual plot. We also used time‐dependent variable importance, analysis of partial dependence survival plots, and the generation of aggregated survival SHapley Additive exPlanations (SurvSHAP) plots to provide a global explanation of the optimal model. Additionally, SurvSHAP (t) and survival local interpretable model‐agnostic explanations (SurvLIME) plots were used to provide a local explanation of the optimal model. Results The final ML models are consisted of six factors: patient's age, gender, marital status, surgical history, as well as tumor's histopathological classification, histological grade, and SEER stage. Our prognostic model exhibits significant discriminative ability, particularly with the ranger model performing optimally. In the training set, validation set, and external validation set, the AUC for 1, 3, and 5 year OS are all above 0.83, and the integrated Brier scores are consistently below 0.15. The explainability analysis of the ranger model also indicates that histological grade, histopathological classification, and age are the most influential factors in predicting OS. Conclusions The ranger ML prognostic model exhibits optimal performance and can be utilized to predict the OS of RLPS patients, offering valuable and crucial references for clinical physicians to make informed decisions in advance.

Keywords