Journal of Big Data (Apr 2024)
Gene selection via improved nuclear reaction optimization algorithm for cancer classification in high-dimensional data
Abstract
Abstract RNA Sequencing (RNA-Seq) has been considered a revolutionary technique in gene profiling and quantification. It offers a comprehensive view of the transcriptome, making it a more expansive technique in comparison with micro-array. Genes that discriminate malignancy and normal can be deduced using quantitative gene expression. However, this data is a high-dimensional dense matrix; each sample has a dimension of more than 20,000 genes. Dealing with this data poses challenges. This paper proposes RBNRO-DE (Relief Binary NRO based on Differential Evolution) for handling the gene selection strategy on (rnaseqv2 illuminahiseq rnaseqv2 un edu Level 3 RSEM genes normalized) with more than 20,000 genes to pick the best informative genes and assess them through 22 cancer datasets. The k-nearest Neighbor (k-NN) and Support Vector Machine (SVM) are applied to assess the quality of the selected genes. Binary versions of the most common meta-heuristic algorithms have been compared with the proposed RBNRO-DE algorithm. In most of the 22 cancer datasets, the RBNRO-DE algorithm based on k-NN and SVM classifiers achieved optimal convergence and classification accuracy up to 100% integrated with a feature reduction size down to 98%, which is very evident when compared to its counterparts, according to Wilcoxon’s rank-sum test (5% significance level).
Keywords