Soft Computing Letters (Dec 2021)
Using data complexity measures and an evolutionary cultural algorithm for gene selection in microarray data
Abstract
Cancer detection using gene expression data has been a major trend of research for the last decade. Microarray gene expression data is one of the most challenging types of data due to high dimensionality and rarity of available samples. Feature redundancy greatly contributes to the difficulty of prediction task. Therefore, it is essential to apply feature selection to datasets to reduce the number of features selected for the classification task. In this paper, a novel two-staged framework is proposed to confront curse of dimensionality in microarray data using data complexity measures and a customized cultural algorithm, incorporating a static belief space into the genetic algorithm in order to reduce the search space and prioritize important genes. Experimental results indicate highly improved accuracy and reduction in number of selected genes compared to the state-of-the-art methods on Gli85, Colon, DLBCL, SMK and CNS datasets.