Egyptian Informatics Journal (Dec 2023)
Optimizing microarray cancer gene selection using swarm intelligence: Recent developments and an exploratory study
Abstract
Microarray data represents a valuable tool for the identification of biomarkers associated with diseases and other biological conditions. Genes, in particular, are a type of biomarker that holds great importance for the identification and understanding of various types of tumors, including brain, lung, and breast cancers. However, a significant portion of these cancer genes are not directly associated with the target disease, which can lead to challenges during analysis, such as increased computational complexity, poor generalization, and decreased classification accuracy, among others. To address this issue, a range of techniques and algorithms have been developed to optimize the selection of the most relevant subset of cancer genes. One highly effective approach to handle this challenge is the use of Swarm Intelligent (SI) algorithms, which are known for their efficiency and effectiveness as global search agents. In this paper, we present two distinct but related sections. First, we conduct a survey of current literature from 2019 to the present, on the use of SI algorithms for optimizing the selection of an optimal subset of cancer genes. Secondly, based on the analysis and findings from the first part, a presentation of an experimental study that evaluates the efficacy of four classical SI algorithms - Particle Swarm Optimization (PSO), Salp Swarm Optimization (SSA), Firefly Algorithm (FA), and Cuckoo Search (CS) – for optimizing the selection of relevant genes in three different cancer datasets. For the experimental study, we used the Chi-square, Mutual Information, and ANOVA filter methods to individually select 100, 200, and 500 relevant genes from the identified cancer datasets. We then passed these genes as input to each of the SI algorithms. The results of the study indicate that diverse filter-wrapper combinations can effectively address the challenge of selecting cancer genes across various datasets.