Cancer Medicine (Mar 2024)

Construction and validation of machine learning models for predicting distant metastases in newly diagnosed colorectal cancer patients: A large‐scale and real‐world cohort study

  • Ran Wei,
  • Guanhua Yu,
  • Xishan Wang,
  • Zheng Jiang,
  • Xu Guan

DOI
https://doi.org/10.1002/cam4.6971
Journal volume & issue
Vol. 13, no. 5
pp. n/a – n/a

Abstract

Read online

Abstract Background More accurate prediction of distant metastases (DM) in patients with colorectal cancer (CRC) would optimize individualized treatment and follow‐up strategies. Multiple prediction models based on machine learning have been developed to assess the likelihood of developing DM. Methods Clinicopathological features of patients with CRC were obtained from the National Cancer Center (NCC, China) and the Surveillance, Epidemiology, and End Results (SEER) database. The algorithms used to create the prediction models included random forest (RF), logistic regression, extreme gradient boosting, deep neural networks, and the K‐Nearest Neighbor machine. The prediction models' performances were evaluated using receiver operating characteristic (ROC) curves. Results In total, 200,958 patients, 3241 from NCC and 197,717 CRC from SEER were identified, of whom 21,736 (10.8%) developed DM. The machine‐learning‐based prediction models for DM were constructed with 12 features remaining after iterative filtering. The RF model performed the best, with areas under the ROC curve of 0.843, 0.793, and 0.806, respectively, on the training, test, and external validation sets. For the risk stratification analysis, the patients were separated into high‐, middle‐, and low‐risk groups according to their risk scores. Patients in the high‐risk group had the highest incidence of DM and the worst prognosis. Surgery, chemotherapy, and radiotherapy could significantly improve the prognosis of the high‐risk and middle‐risk groups, whereas the low‐risk group only benefited from surgery and chemotherapy. Conclusion The RF‐based model accurately predicted the likelihood of DM and identified patients with CRC in the high‐risk group, providing guidance for personalized clinical decision‐making.

Keywords