Scientific Reports (Sep 2024)
Development and validation of machine learning models for diagnosis and prognosis of lung adenocarcinoma, and immune infiltration analysis
Abstract
Abstract The aim of our study was to develop robust diagnostic and prognostic models for lung adenocarcinoma (LUAD) using machine learning (ML) techniques, focusing on early immune infiltration. Feature selection was performed on The Cancer Genome Atlas (TCGA) data using least absolute shrinkage and selection Operator (LASSO), random forest (RF), and support vector machine (SVM) algorithms. Six ML algorithms were employed to construct the diagnostic models, which were evaluated through receiver operating characteristic (ROC) curves, precision-recall curves (PRC), and classification error (CE), and validated on the GSE7670 dataset. Additionally, a lasso cox prognostic model was built on the TCGA-LUAD dataset and externally validated using independent Gene Expression Omnibus datasets (GSE30219, GSE31210, GSE50081, and GSE37745). Single-sample gene set enrichment analysis (ssGSEA) was performed to assess immune cell infiltration in stage I LUAD samples, revealing significant differences in immune cell types. These findings demonstrate a positive correlation between immune infiltration in stage I LUAD and Th2 cells, Tcm cells, and T helper cells, while a negative correlation was observed with Macrophages, Eosinophils, and Tem cells. These insights provide novel perspectives for clinical diagnosis and treatment of LUAD.
Keywords