Insights into Imaging (May 2025)

The interpretable CT-based vision transformer model for preoperative prediction of clear cell renal cell carcinoma SSIGN score and outcome

  • Kaiyue Zhi,
  • Yanmei Wang,
  • Lei Yan,
  • Feng Hou,
  • Jie Wu,
  • Shuo Zhang,
  • He Zhu,
  • Lianzi Zhao,
  • Ning Wang,
  • Xia Zhao,
  • Xianjun Li,
  • Yicong Wang,
  • Chengcheng Chen,
  • Nan Wang,
  • Yuchao Xu,
  • Guangjie Yang,
  • Pei Nie

DOI
https://doi.org/10.1186/s13244-025-01972-0
Journal volume & issue
Vol. 16, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Objectives To develop and validate an interpretable CT-based vision transformer (ViT) model for preoperative prediction of the stage, size, grade, and necrosis (SSIGN) and outcome in clear cell renal cell carcinoma (ccRCC) patients. Methods Eight hundred forty-five ccRCC patients from multiple centers were retrospectively enrolled. For each patient, 768 ViT features were extracted in the cortical medullary phase (CMP) and renal parenchymal phase (RPP) images, respectively. The CMP ViT model (CVM), RPP ViT model (RVM), and CMP-RPP combined ViT model (CRVM) were constructed to predict the SSIGN in ccRCC patients. The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance of each model. Decision curve analysis (DCA) was used to evaluate the net clinical benefit. The endpoint was the progression-free survival (PFS). Kaplan–Meier survival analysis was used to assess the association between model-predicted SSIGN and PFS. The SHAP approach was applied to determine the prediction process of the CRVM. Results The CVM, RVM, and CRVM demonstrated good performance in predicting SSIGN, with a high AUC of 0.859, 0.883, and 0.895, respectively, in the test cohort. DCA demonstrated the CRVM performed best in clinical net benefit. In predicting PFS, CRVM achieved a higher Harrell’s concordance index (C-index, 0.840) than the CVM (0.719) and RVM (0.773) in the test cohort. The SHAP helped us understand the impact of ViT features on CRVM’s SSIGN prediction from a global and individual perspective. Conclusion The interpretable CT-based CRVM may serve as a non-invasive biomarker in predicting the SSIGN and outcome of ccRCC. Critical relevance statement Our findings outline the potential of an interpretable CT-based ViT biomarker for predicting the SSIGN score and outcome of ccRCC, which might facilitate patient counseling and assist clinicians in therapy decision-making for individual cases. Key Points Current first-line imaging lacks preoperative prediction of the SSIGN score for ccRCC patients. The ViT model could predict the SSIGN score and outcome of ccRCC patients. This study can facilitate the development of personalized treatment for ccRCC patients. Graphical Abstract

Keywords