BMC Medical Genomics (Jun 2023)
Construction and verification of atopic dermatitis diagnostic model based on pyroptosis related biological markers using machine learning methods
Abstract
Abstract Objective The aim of this study was to construct a model used for the accurate diagnosis of Atopic dermatitis (AD) using pyroptosis related biological markers (PRBMs) through the methods of machine learning. Method The pyroptosis related genes (PRGs) were acquired from molecular signatures database (MSigDB). The chip data of GSE120721, GSE6012, GSE32924, and GSE153007 were downloaded from gene expression omnibus (GEO) database. The data of GSE120721 and GSE6012 were combined as the training group, while the others were served as the testing groups. Subsequently, the expression of PRGs was extracted from the training group and differentially expressed analysis was conducted. CIBERSORT algorithm calculated the immune cells infiltration and differentially expressed analysis was conducted. Consistent cluster analysis divided AD patients into different modules according to the expression levels of PRGs. Then, weighted correlation network analysis (WGCNA) screened the key module. For the key module, we used Random forest (RF), support vector machines (SVM), Extreme Gradient Boosting (XGB), and generalized linear model (GLM) to construct diagnostic models. For the five PRBMs with the highest model importance, we built a nomogram. Finally, the results of the model were validated using GSE32924, and GSE153007 datasets. Results Nine PRGs were significant differences in normal humans and AD patients. Immune cells infiltration showed that the activated CD4+ memory T cells and Dendritic cells (DCs) were significantly higher in AD patients than normal humans, while the activated natural killer (NK) cells and the resting mast cells were significantly lower in AD patients than normal humans. Consistent cluster analysis divided the expressing matrix into 2 modules. Subsequently, WGCNA analysis showed that the turquoise module had a significant difference and high correlation coefficient. Then, the machine model was constructed and the results showed that the XGB model was the optimal model. The nomogram was constructed by using HDAC1, GPALPP1, LGALS3, SLC29A1, and RWDD3 five PRBMs. Finally, the datasets GSE32924 and GSE153007 verified the reliability of this result. Conclusions The XGB model based on five PRBMs can be used for the accurate diagnosis of AD patients.
Keywords