Scientific Reports (Apr 2024)

Machine learning-based survival prediction nomogram for postoperative parotid mucoepidermoid carcinoma

  • Zongwei Huang,
  • Zihan Chen,
  • Ying Li,
  • Ting Lin,
  • Sunqin Cai,
  • Wenxi Wu,
  • Lishui Wu,
  • Siqi Xu,
  • Jun Lu,
  • Sufang Qiu

DOI
https://doi.org/10.1038/s41598-024-58329-8
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Parotid mucoepidermoid carcinoma (P-MEC) is a significant histopathological subtype of salivary gland cancer with inherent heterogeneity and complexity. Existing clinical models inadequately offer personalized treatment options for patients. In response, we assessed the efficacy of four machine learning algorithms vis-à-vis traditional analysis in forecasting the overall survival (OS) of P-MEC patients. Using the SEER database, we analyzed data from 882 postoperative P-MEC patients (stages I–IVA). Single-factor Cox regression and four machine learning techniques (random forest, LASSO, XGBoost, best subset regression) were employed for variable selection. The optimal model was derived via stepwise backward regression, Akaike Information Criterion (AIC), and Area Under the Curve (AUC). Bootstrap resampling facilitated internal validation, while prediction accuracy was gauged through C-index, time-dependent ROC curve, and calibration curve. The model’s clinical relevance was ascertained using decision curve analysis (DCA). The study found 3-, 5-, and 10-year OS rates of 0.887, 0.841, and 0.753, respectively. XGBoost, BSR, and LASSO stood out in predictive efficacy, identifying seven key prognostic factors including age, pathological grade, T stage, N stage, radiation therapy, chemotherapy, and marital status. A subsequent nomogram revealed a C-index of 0.8499 (3-year), 0.8557 (5-year), and 0.8375 (10-year) and AUC values of 0.8670, 0.8879, and 0.8767, respectively. The model also highlighted the clinical significance of postoperative radiotherapy across varying risk levels. Our prognostic model, grounded in machine learning, surpasses traditional models in prediction and offer superior visualization of variable importance.

Keywords