Iranian Rehabilitation Journal (Jun 2022)
Identifying Gene Signature in RNA Sequencing Multiple Sclerosis Data
Abstract
Objectives: Multiple Sclerosis (MS) is a complex central nervous system disease; it is the result of a combination of genetic predispositions and a nongenetic trigger. This study aims to find the gene signatures using a Pareto optimization algorithm for MS RNA sequencing (RNA-seq) data. Methods: This case-control study involved 50 samples (25 MS patients and 25 age-matched healthy individuals) and their GSE profiles (GSE123496) were selected from the National Center for Biotechnology Information Gene Expression Omnibus database. We used Pareto-optimal cluster size identification to find the gene signatures in the RNA-seq data. After prefiltering and normalizing the data, we used the Limma package to find the differentially expressed genes (DEGs). The Pareto-optimal cluster size for these DEGs was then determined using the technique, multi-objective optimization for collecting the clusters alternatives. Afterward, the RNA-seq data were clustered via k-means with suitable cluster size. The best cluster, as a signature, was found by calculating the mean of the Spearman correlation coefficients (SCCs) of whole genes in the module in a pairwise manner. All analysis was performed in the R software, 4.1.1 package, under virtual space with 100 GB RAM. Results: In total, 960 DEGs were identified by the Limma analysis. Among them, 720 were up-regulated genes and 240 were down-regulated genes. Meanwhile, 6 Pareto-optimal clusters were obtained. Two clusters that had the greatest average SCCs score (0.88 and 0.74, respectively) were chosen as the gene signatures. Discussion: A total of 9 metabolic prognostic genes and 3 biological pathways were identified. These can provide more potent prognostic information for MS patients.