BMC Cancer (Nov 2024)
Comparison of machine learning methods for Predicting 3-Year survival in elderly esophageal squamous cancer patients based on oxidative stress
Abstract
Summary Background Oxidative stress process plays a key role in aging and cancer; however, currently, there is paucity of machine-learning model studies investigating the relationship between oxidative stress and prognosis of elderly patients with esophageal squamous cancer (ESCC). Methods This study included elderly patients with ESCC who underwent curative ESCC resection surgery continuously from January 2013 to December 2020 and were stratified into the training and external validation cohorts. Using Cox stepwise regression analysis based on Akaike information criterion, the relationship between oxidative stress biomarkers and prognosis was explored, and a geriatric ESCC-related oxidative stress score (OSS) was constructed. To construct a predictive model for 3-year overall survival (OS), machine-learning strategies including decision tree (DT), random forest (RF), and support vector machine (SVM) were employed. These machine-learning strategies play a key role in data mining and pattern recognition tasks. Each model was tested in the external validation cohort through 1000 resampling iterations. Validation was conducted using receiver operating characteristic area under the curve (AUC) and calibration plots. Results The training cohort and validation cohort consisted of 340 and 145 patients, respectively. In the training cohort, the 3-year OS rate for patients was 59.2%. We constructed the OSS based on systemic oxidative stress biomarkers using the training cohort. The study found that pathological N stage, pathological T stage, tumor histological type, lymphovascular invasion, CEA, OSS, CA 19 − 9, and the amount of bleeding were the most important factors influencing the 3-year OS. These eight important features were included in training the RF, DT, and SVM and trained on the training cohort and validated cohort, respectively. In the training cohort, the RF model demonstrated the highest predictive performance with an AUC of 0.975 (0.962–0.987), while the DT model is 0.784 (0.739–0.830) and the SVM is 0.879 (0.843–0.916). In the external validation cohort, the RF model again exhibited the highest performance with an AUC of 0.791 (0.717–0.864), compared to the DT model with an AUC of 0.717 (0.640–0.794) and 0.779 (0.702–0.856) in SVM. Conclusions The random forest clinical prediction model constructed based on OSS can effectively predict the prognosis of elderly patients with ESCC after curative surgery.
Keywords