PeerJ (Jul 2024)
Machine learning and bioinformatics analysis of diagnostic biomarkers associated with the occurrence and development of lung adenocarcinoma
Abstract
Objective Lung adenocarcinoma poses a major global health challenge and is a leading cause of cancer-related deaths worldwide. This study is a review of three molecular biomarkers screened by machine learning that are not only important in the occurrence and progression of lung adenocarcinoma but also have the potential to serve as biomarkers for clinical diagnosis, prognosis evaluation and treatment guidance. Methods Differentially expressed genes (DEGs) were identified using comprehensive GSE1987 and GSE18842 gene expression databases. A comprehensive bioinformatics analysis of these DEGs was conducted to explore enriched functions and pathways, relative expression levels, and interaction networks. Random Forest and LASSO regression analysis techniques were used to identify the three most significant target genes. The TCGA database and quantitative polymerase chain reaction (qPCR) experiments were used to verify the expression levels and receiver operating characteristic (ROC) curves of these three target genes. Furthermore, immune invasiveness, pan-cancer, and mRNA-miRNA interaction network analyses were performed. Results Eighty-nine genes showed increased expression and 190 genes showed decreased expression. Notably, the upregulated DEGs were predominantly associated with organelle fission and nuclear division, whereas the downregulated DEGs were mainly associated with genitourinary system development and cell-substrate adhesion. The construction of the DEG protein-protein interaction network revealed 32 and 19 hub genes with the highest moderate values among the upregulated and downregulated genes, respectively. Using random forest and LASSO regression analyses, the hub genes were employed to identify three most significant target genes.TCGA database and qPCR experiments were used to verify the expression levels and ROC curves of these three target genes, and immunoinvasive analysis, pan-cancer analysis and mRNA-miRNA interaction network analysis were performed. Conclusion Three target genes identified by machine learning: BUB1B, CENPF, and PLK1 play key roles in LUAD development of lung adenocarcinoma.
Keywords