Scientific Reports (Jul 2024)
Machine learning-based screening and validation of liver metastasis-specific genes in colorectal cancer
Abstract
Abstract Colorectal liver metastasis (CRLM) is challenging in the clinical treatment of colorectal cancer. Limited research has been conducted on how CRLM develops. RNA sequencing data were obtained from the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA). Four machine learning algorithms were used to screen the hub CRLM-specific genes, including Least Absolute Shrinkage and Selection Operator (Lasso), Random forest, SVM-RFE, and XGboost. The model for identifying CRLM was developed using stepwise logistic regression and was validated using internal and independent datasets. The prognostic value of hub CRLM-specific genes was assessed using the Lasso-Cox method. The in vitro experiments were performed using SW620 cells. The CRLM identification model was developed based on four CRLM-specific genes (SPP1, ZG16, P2RY14, and PRKAR2B), and the model efficacy was validated using GSE41258 and three external cohorts. Five CRLM-specific prognostic hub genes, SPP1, ZG16, P2RY14, CYP2E1, and C5, were identified using the Lasso-Cox algorithm, and a risk score was constructed. The risk score was validated using the GSE39582 cohort. Three genes have both efficacy in identifying CRLM and prognostic value: ZG16, P2RY14, and SPP1. Immune infiltration and enrichment analyses demonstrated that SPP1 was associated with M2 macrophage polarization and extracellular matrix remodeling. In vitro experiments indicated that SPP1 may act as a cancer-promoting factor. The hub CRLM-specific gene SPP1 can help determine the diagnosis, prognosis, and immune infiltration of patients with CRLM.
Keywords