Heliyon (Mar 2024)
Predicting mortality and recurrence in colorectal cancer: Comparative assessment of predictive models
Abstract
Introduction: Colorectal cancer (CRC), also known as colorectal cancer, is a significant disease marked by high fatality rates, ranking as the third leading cause of global mortality. The main objective of this study was to assess the accuracy of predictive models in predicting both mortality events and the probability of disease recurrence. Method: A retrospective analysis was conducted on a cohort of 284 individuals diagnosed with colorectal cancer between 2001 and 2017. Demographic and clinical data, including gender, disease stage, age at diagnosis, recurrence status, and treatment details, were meticulously recorded. We rigorously evaluated various predictive models, including Decision Trees, Random Forests, Random Survival Forests (RSF), Gradient Boosting, mboost, Deep Learning Neural Network (DLNN), and Cox regression. Performance metrics, such as sensitivity, positive predictive value (PPV), specificity, area under the receiver operating characteristic curve (ROC area), and overall accuracy, were calculated for each model to predict mortality and disease recurrence. The analysis was performed using R version 4.1.3 software and the Python programming language. Results: For mortality prediction, the mboost model demonstrated the highest sensitivity at 96.9% (95% CI: 0.83–0.99) and an ROC area of 0.88. It also exhibited high specificity at 80% (95% CI: 0.59–0.93), a positive predictive value of 86.1% (95% CI: 0.70–0.95), and an overall accuracy of 89% (95% CI: 0.78–0.96). Random Forests showed perfect sensitivity of 100% (95% CI: 0.85–1) but had low specificity at 0% (95% CI: 0–0.52) and poor overall accuracy (50%). On the other hand, DLNN had the lowest performance metrics for mortality prediction, with a sensitivity of 24% (95% CI: 0.222–0.268), specificity of 75% (95% CI: 0.73–0.77), and a lower positive predictive value of 42% (95% CI: 0.38–0.45). The Gradient Boosting model showed the best performance in predicting recurrence, achieving perfect sensitivity of 100% (95% CI: 0.87–1) and high specificity at 92.9% (95% CI: 0.76–0.99). It also had a high positive predictive value of 93.3% (95% CI: 0.77–0.99). Gradient Boosting, with an ROC area of 96.4%, and mboost, with an ROC area of 75%, demonstrated remarkable performance. DLNN had the lowest performance metrics for recurrence prediction, with sensitivity at 1.75% (95% CI: 0.01–0.02), specificity at 98% (95% CI: 0.97–0.98), and a lower positive predictive value at 52.6% (95% CI: 0.39–0.65). Conclusion: In summary, the mboost model demonstrated outstanding performance in predicting mortality, achieving exceptional results across various evaluation metrics. Random Forests exhibited perfect sensitivity but showed poor specificity and overall accuracy. The DLNN model displayed the lowest performance metrics for mortality prediction. In terms of recurrence prediction, the Gradient Boosting model outperformed other models with perfect sensitivity, high specificity, and positive predictive value. The DLNN model had the lowest performance metrics for recurrence prediction. Overall, the results emphasize the effectiveness of the mboost and Gradient Boosting models in predicting mortality and recurrence in colorectal cancer patients.