Scientific Reports (Oct 2023)

Development of machine learning prognostic models for overall survival of prostate cancer patients with lymph node-positive

  • Zi-He Peng,
  • Juan-Hua Tian,
  • Bo-Hong Chen,
  • Hai-Bin Zhou,
  • Hang Bi,
  • Min-Xin He,
  • Ming-Rui Li,
  • Xin-Yu Zheng,
  • Ya-Wen Wang,
  • Tie Chong,
  • Zhao-Lun Li

DOI
https://doi.org/10.1038/s41598-023-45804-x
Journal volume & issue
Vol. 13, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Prostate cancer (PCa) patients with lymph node involvement (LNI) constitute a single-risk group with varied prognoses. Existing studies on this group have focused solely on those who underwent prostatectomy (RP), using statistical models to predict prognosis. This study aimed to develop an easily accessible individual survival prediction tool based on multiple machine learning (ML) algorithms to predict survival probability for PCa patients with LNI. A total of 3280 PCa patients with LNI were identified from the Surveillance, Epidemiology, and End Results (SEER) database, covering the years 2000–2019. The primary endpoint was overall survival (OS). Gradient Boosting Survival Analysis (GBSA), Random Survival Forest (RSF), and Extra Survival Trees (EST) were used to develop prognosis models, which were compared to Cox regression. Discrimination was evaluated using the time-dependent areas under the receiver operating characteristic curve (time-dependent AUC) and the concordance index (c-index). Calibration was assessed using the time-dependent Brier score (time-dependent BS) and the integrated Brier score (IBS). Moreover, the beeswarm summary plot in SHAP (SHapley Additive exPlanations) was used to display the contribution of variables to the results. The 3280 patients were randomly split into a training cohort (n = 2624) and a validation cohort (n = 656). Nine variables including age at diagnosis, race, marital status, clinical T stage, prostate-specific antigen (PSA) level at diagnosis, Gleason Score (GS), number of positive lymph nodes, radical prostatectomy (RP), and radiotherapy (RT) were used to develop models. The mean time-dependent AUC for GBSA, RSF, and EST was 0.782 (95% confidence interval [CI] 0.779–0.783), 0.779 (95% CI 0.776–0.780), and 0.781 (95% CI 0.778–0.782), respectively, which were higher than the Cox regression model of 0.770 (95% CI 0.769–0.773). Additionally, all models demonstrated almost similar calibration, with low IBS. A web-based prediction tool was developed using the best-performing GBSA, which is accessible at https://pengzihexjtu-pca-n1.streamlit.app/ . ML algorithms showed better performance compared with Cox regression and we developed a web-based tool, which may help to guide patient treatment and follow-up.