IEEE Access (Jan 2024)
MOBCSA: Multi-Objective Binary Cuckoo Search Algorithm for Features Selection in Bioinformatics
Abstract
In bioinformatics, medical diagnosis models might be significantly impacted by high-dimensional data generated by high-throughput technologies. This data includes redundant or irrelevant genes, making it challenging to identify the relevant genes from such high-dimensional data. Therefore, an effective feature selection (FS) technique is crucial to mitigate dimensionality, thereby enhancing the performance and accuracy of medical diagnosis. The Cuckoo Search Algorithm (CSA) has proven effective in gene selection, demonstrating prowess in exploitation, exploration, and convergence. However, most of the current CSA-based FS techniques deal with gene selection problems as a single objective rather than adopting a multi-objective mechanism. This article proposes the Multi-Objective Binary Cuckoo Search Algorithm (MOBCSA) for gene selection. MOBCSA extends the standard CSA by incorporating multiple objectives, including accuracy of classification and number of selected genes. MOBCSA utilizes an S-shaped transfer function for transforming the algorithm’s search space from a continuous to a binary search space. MOBCSA integrates two components: an external archive to save the pareto optimal solutions attained during the search process, and an adaptive crowding distance updating mechanism integrated into the archive to maintain diversity and increase the coverage of optimal solutions. To assess MOBCSA’s performance, evaluation experiments were conducted on six benchmark biomedical datasets using three different classifiers. Then, the obtained experimental results were compared against four multi-objective-based state-of-the art FS methods. The findings prove that MOBCSA surpasses the other methods in both accuracy of classification and number of selected genes, where it has obtained an average accuracy ranging from 92.79% to 98.42% and an average number of selected genes ranging from 15.67 to 27.88 for different classifiers and datasets.
Keywords