Cancer Control (Mar 2024)
Development and Validation of a Machine Learning Prognostic Model of m5C Related immune Genes in Lung Adenocarcinoma
Abstract
Background The aim of this retrospective research was to develop an immune-related genes significantly associated with m5C methylation methylation (m5C-IRGs)-related signature associated with lung adenocarainoma (LUAD). Methods We introduced transcriptome data to screen out m5C-IRGs in The Cancer Genome Atlas (TCGA)-LUAD dataset. Subsequently, the m5C-IRGs associated with survival were certificated by Kaplan Meier (K-M) analysis. The univariate Cox, least absolute shrinkage and selection operator (LASSO) regression, and xgboost.surv tool were adopted to build a LUAD prognostic signature. We further conducted gene functional enrichment, immune microenvironment and immunotherapy analysis between 2 risk subgroups. Finally, we verified m5C-IRGs-related prognostic gene expression in transcription level. Results A total of 76 m5C-IRGs were identified in TCGA-LUAD dataset. Furthermore, 27 m5C-IRGs associated with survival were retained. Then, a m5C-IRGs prognostic signature was build based on the 3 prognostic genes (HLA-DMB, PPIA, and GPI). Independent prognostic analysis suggested that stage and RiskScore could be used as independent prognostic factors. We found that 4104 differentially expressed genes (DEGs) between the 2 risk subgroups were mainly concerned in immune receptor pathways. We found certain distinction in LUAD immune microenvironment between the 2 risk subgroups. Then, immunotherapy analysis and chemotherapeutic drug sensitivity results indicated that the m5C-IRGs-related gene signature might be applied as a therapy predictor. Finally, we found significant higher expression of PPIA and GPI in LUAD group compared to the normal group. Conclusions The prognostic signature comprised of HLA-DMB, PPIA, and GPI based on m5C-IRGs was established, which might provide theoretical basis and reference value for the research of LUAD. Public Datasets Analyzed in the Study TCGA-LUAD dataset was collected from the TCGA ( https://portal.gdc.cancer.gov/ ) database, GSE31210 (validation set) was retrieved from GEO ( https://www.ncbi.nlm.nih.gov/geo/ ) database.