npj Precision Oncology (Oct 2024)
Development and validation of machine learning models for young-onset colorectal cancer risk stratification
Abstract
Abstract Incidence of young-onset colorectal cancer (YOCRC, younger than 50) has significantly increased worldwide. The performance of fecal immunochemical test in detecting YOCRC is unsatisfactory. Using routine clinical data, we aimed to develop machine learning (ML) models to identify individuals with high-risk YOCRC who require further colonoscopy. We retrospectively extracted data of 10,874 young individuals. Multiple supervised ML techniques were devised to distinguish individuals with and without CRC, classifiers were trained, internally validated and temporally validated. In internal validation cohort, Random Forest (RF) ML model demonstrated good performance with AUC of 0.859 and highest recall of 0.840. In temporal validation cohort, the RF ML model also exhibited good classification performance, achieving AUC of 0.888 and highest recall of 0.872. RF algorithm-based approach is effective and feasible in YOCRC risk stratification. This could be valuable in assessing the risk of YOCRC so that clinical management, including further colonoscopy, can be subsequently made. (Registration: This study was registered with ClinicalTrials.gov (NCT06342622) on March 15, 2024.).