PeerJ (Oct 2021)
Development of a novel embryonic germline gene-related prognostic model of lung adenocarcinoma
Abstract
Background Emerging evidence implicates the correlation of embryonic germline genes with the tumor progress and patient’s outcome. However, the prognostic value of these genes in lung adenocarcinoma (LUAD) has not been fully studied. Here we systematically evaluated this issue, and constructed a novel signature and a nomogram associated with embryonic germline genes for predicting the outcomes of lung adenocarcinoma. Methods The LUAD cohorts retrieved from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) database were used as training set and testing set, respectively. The embryonic germline genes were downloaded from the website https://venn.lodder.dev. Then, the differentially expressed embryonic germline genes (DEGGs) between the tumor and normal samples were identified by limma package. The functional enrichment and pathway analyses were also performed by clusterProfiler package. The prognostic model was constructed by the least absolute shrinkage and selection operator (LASSO)-Cox regression method. Survival and Receiver Operating Characteristic (ROC) analyses were performed to validate the model using training set and four testing GEO datasets. Finally, a prognostic nomogram based on the signature genes was constructed using multivariate regression method. Results Among the identified 269 DEGGs, 249 were up-regulated and 20 were down-regulated. GO and KEGG analyses revealed that these DEGGs were mainly enriched in the process of cell proliferation and DNA damage repair. Then, 103 DEGGs with prognostic value were identified by univariate Cox regression and further filtered by LASSO method. The resulting sixteen DEGGs were included in step multivariate Cox regression and an eleven embryonic germline gene related signature (EGRS) was constructed. The model could robustly stratify the LUAD patients into high-risk and low-risk groups in both training and testing sets, and low-risk patients had much better outcomes. The multi-ROC analysis also showed that the EGRS model had the best predictive efficacy compared with other common clinicopathological factors. The EGRS model also showed robust predictive ability in four independent external datasets, and the area under curve (AUC) was 0.726 (GSE30219), 0.764 (GSE50081), 0.657 (GSE37745) and 0.668 (GSE72094). More importantly, the expression level of some genes in EGRS has a significant correlation with the progression of LUAD clinicopathology, suggesting these genes might play an important role in the progression of LUAD. Finally, based on EGRS genes, we built and calibrated a nomogram for conveniently evaluating patients’ outcomes.
Keywords