Heliyon (Sep 2023)
Hybrid black widow optimization with iterated greedy algorithm for gene selection problems
Abstract
Gene Selection (GS) is a strategy method targeted at reducing redundancy, limited expressiveness, and low informativeness in gene expression datasets obtained by DNA Microarray technology. These datasets contain a plethora of diverse and high-dimensional samples and genes, with a significant discrepancy in the number of samples and genes present. The complexities of GS are especially noticeable in the context of microarray expression data analysis, owing to the inherent data imbalance. The main goal of this study is to offer a simplified and computationally effective approach to dealing with the conundrum of attribute selection in microarray gene expression data. We use the Black Widow Optimization algorithm (BWO) in the context of GS to achieve this, using two unique methodologies: the unaltered BWO variation and the hybridized BWO variant combined with the Iterated Greedy algorithm (BWO-IG). By improving the local search capabilities of BWO, this hybridization attempts to promote more efficient gene selection. A series of tests was carried out using nine benchmark datasets that were obtained from the gene expression data repository in the pursuit of empirical validation. The results of these tests conclusively show that the BWO-IG technique performs better than the traditional BWO algorithm. Notably, the hybridized BWO-IG technique excels in the efficiency of local searches, making it easier to identify relevant genes and producing findings with higher levels of reliability in terms of accuracy and the degree of gene pruning. Additionally, a comparison analysis is done against five modern wrapper Feature Selection (FS) methodologies, namely BIMFOHHO, BMFO, BHHO, BCS, and BBA, in order to put the suggested BWO-IG method's effectiveness into context. The comparison that follows highlights BWO-IG's obvious superiority in reducing the number of selected genes while also obtaining remarkably high classification accuracy. The key findings were an average classification accuracy of 94.426, average fitness values of 0.061, and an average number of selected genes of 2933.767.