Cancer Medicine (Apr 2024)

Application of interpretable machine learning algorithms to predict distant metastasis in ovarian clear cell carcinoma

  • Qin‐Hua Guo,
  • Feng‐Chun Xie,
  • Fang‐Min Zhong,
  • Wen Wen,
  • Xue‐Ru Zhang,
  • Xia‐Jing Yu,
  • Xin‐Lu Wang,
  • Bo Huang,
  • Li‐Ping Li,
  • Xiao‐Zhong Wang

DOI
https://doi.org/10.1002/cam4.7161
Journal volume & issue
Vol. 13, no. 7
pp. n/a – n/a

Abstract

Read online

Abstract Background Ovarian clear cell carcinoma (OCCC) represents a subtype of ovarian epithelial carcinoma (OEC) known for its limited responsiveness to chemotherapy, and the onset of distant metastasis significantly impacts patient prognoses. This study aimed to identify potential risk factors contributing to the occurrence of distant metastasis in OCCC. Methods Utilizing the Surveillance, Epidemiology, and End Results (SEER) database, we identified patients diagnosed with OCCC between 2004 and 2015. The most influential factors were selected through the application of Gaussian Naive Bayes (GNB) and Adaboost machine learning algorithms, employing a Venn test for further refinement. Subsequently, six machine learning (ML) techniques, namely XGBoost, LightGBM, Random Forest (RF), Adaptive Boosting (Adaboost), Support Vector Machine (SVM), and Multilayer Perceptron (MLP), were employed to construct predictive models for distant metastasis. Shapley Additive Interpretation (SHAP) analysis facilitated a visual interpretation for individual patient. Model validity was assessed using accuracy, sensitivity, specificity, positive predictive value, negative predictive value, F1 score, and the area under the receiver operating characteristic curve (AUC). Results In the realm of predicting distant metastasis, the Random Forest (RF) model outperformed the other five machine learning algorithms. The RF model demonstrated accuracy, sensitivity, specificity, positive predictive value, negative predictive value, F1 score, and AUC (95% CI) values of 0.792 (0.762–0.823), 0.904 (0.835–0.973), 0.759 (0.731–0.787), 0.221 (0.186–0.256), 0.974 (0.967–0.982), 0.353 (0.306–0.399), and 0.834 (0.696–0.967), respectively, surpassing the performance of other models. Additionally, the calibration curve's Brier Score (95%) for the RF model reached the minimum value of 0.06256 (0.05753–0.06759). SHAP analysis provided independent explanations, reaffirming the critical clinical factors associated with the risk of metastasis in OCCC patients. Conclusions This study successfully established a precise predictive model for OCCC patient metastasis using machine learning techniques, offering valuable support to clinicians in making informed clinical decisions.

Keywords