Scientific Reports (Jan 2025)
Combining machine learning and single-cell sequencing to identify key immune genes in sepsis
Abstract
Abstract This research aimed to identify novel indicators for sepsis by analyzing RNA sequencing data from peripheral blood samples obtained from sepsis patients (n = 23) and healthy controls (n = 10). 5148 differentially expressed genes were identified using the DESeq2 technique and 5636 differentially expressed genes were identified by the limma method(|Log2 Fold Change|≥2, FDR < 0.05). A total of 1793 immune-related genes were identified from the ImmPort database, with 358 genes identified in both groups. Next, a Biological association network was constructed, and five key hub genes (CD4, HLA-DOB, HLA-DRB1, HLA-DRA, AHNAK) were identified using a combination of three topological analysis algorithms (MCC, Closeness, and MNC) and four machine learning algorithms (Random Forest, LASSO regression, SVM, and XGBoost). immune cell distribution showed that the key genes correlated with multiple immune cell infiltrations. Gene Set Enrichment Analysis (GSEA) revealed that the key genes involved multiple immune response and inflammation-related signaling pathways. Subsequently, diagnostic models were constructed using four machine learning algorithms (Logistic regression, AdaBoost, KNN, and XGBoost) based on the identified key genes. Models with the highest performance were then selected. Ultimately, single-cell sequencing data revealed that the identified key genes were expressed in various immune cells, while Quantitative PCR (qPCR) tests confirmed their reduced expression in the sepsis group.
Keywords