Frontiers in Medicine (Feb 2022)
Identification of Signature Genes and Characterizations of Tumor Immune Microenvironment and Tumor Purity in Lung Adenocarcinoma Based on Machine Learning
Abstract
The implication of the Estimation of Stromal and Immune cells in Malignant tumor tissues using expression data (ESTIMATE) method to determine the tumor microenvironment (TME) and tumor immune score including tumor purity represents an efficient method to identify and assess biomarkers for immunotherapy response in precision medicine. In this study we utilized a machine learning algorithm to analyze the Cancer Genome Atlas (TCGA) and Gene Expression Omnibus database (GEO) lung adenocarcinoma (LUAD) transcriptome data to evaluate the association between TME and tumor purity. Furthermore, we investigated whether fewer TME components or a few dominant genes can infer tumor purity. The results indicated that the 29 immune infiltrating components determined by the ssGSEA method could screen the 5 TME components [chemokine C-C-Motif receptor (CCR), T-helper-cells, Check-point, Treg, and tumor-infiltrating lymphocytes (TIL)] that significantly contributed the most to tumor purity prediction through regression tree and random forest regression methods. The findings revealed that higher activity of these five immune infiltrating components significantly lowered the tumor purity. Moreover, 5 TME components contributed significantly to the improvement of Mean Square Error (MES); therefore, we selected these five sets' genes and analyzed survival data to establish a prognostic model. We screened out 11 prognostic-related genes and constructed a risk model comprising 11 genes with good predictive value for patients' prognosis. Furthermore, we obtained four genes (GIMAP6, CD80, IL16, and CCR2) that had predictive advantages for tumor purity using random forest classification and random forest regression. The comprehensive score of genes for tumor purity prediction (CSGTPP) was obtained by least absolute shrinkage and selection operator (LASSO) regression indicated that four genes could be successfully used to classify high and low CSGTPP samples and that tumor purity was negatively correlated with CSGTPP. Survival analysis revealed that the higher the CSGTPP, the better the prognosis of patients. The association between a cluster of differentiation 274 (CD274) and CSGTPP revealed a higher expression of CD274 in the high CSGTPP group. Collectively, we speculated that CSGTPP could serve as a predictor of the response to immunotherapy and a promising indicator of immunotherapy effect.
Keywords