Cancer Medicine (Mar 2023)

A 5‐year survival status prognosis of nonmetastatic cervical cancer patients through machine learning algorithms

  • Wenke Yu,
  • Yanwei Lu,
  • Huafeng Shou,
  • Hong’en Xu,
  • Lei Shi,
  • Xiaolu Geng,
  • Tao Song

DOI
https://doi.org/10.1002/cam4.5477
Journal volume & issue
Vol. 12, no. 6
pp. 6867 – 6876

Abstract

Read online

Abstract Background Prediction models with high accuracy rates for nonmetastatic cervical cancer (CC) patients are limited. This study aimed to construct and compare predictive models on the basis of machine learning (ML) algorithms for predicting the 5‐year survival status of CC patients through using the Surveillance, Epidemiology, and End Results public database of the National Cancer Institute. Methods The data registered from 2004 to 2016 were extracted and randomly divided into training and validation cohorts (8:2). The least absolute shrinkage and selection operator (LASSO) regression was employed to identify significant factors. Then, four predictive models were constructed, including logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGBoost). The predictive models were evaluated and compared using Receiver‐operating characteristics with areas under the curves (AUCs) and decision curve analysis (DCA), respectively. Results A total of 13,802 patients were involved and classified into training (N = 11,041) and validation (N = 2761) cohorts. By using the LASSO regression method, seven factors were identified. In the training cohort, the XGBoost model showed the best performance (AUC = 0.8400) compared to the other three models (all p < 0.05 by Delong's test). In the validation cohort, the XGBoost model also demonstrated a superior prediction ability (AUC = 0.8365) than LR and SVM models (both p < 0.05 by Delong's test), although the difference was not statistically significant between the XGBoost and the RF models (p = 0.4251 by Delong's test). Based on the DCA results, the XGBoost model was also superior, and feature importance analysis indicated that the tumor stage was the most important variable among the seven factors. Conclusions The XGBoost model proved to be an effective algorithm with better prediction abilities. This model is proposed to support better decision‐making for nonmetastatic CC patients in the future.

Keywords