Scientific African (Mar 2024)
Hybridization of data-driven threshold algorithm with fuzzy particle swarm optimization technique for gene selection in microarray data
Abstract
Microarrays have revolutionized genomics by enabling the simultaneous measurement of thousands of gene expressions. However, the high dimensionality of microarray data poses challenges in identifying relevant genes for disease diagnosis and biomarker discovery. This article introduces a novel hybrid approach for gene selection in microarray data that combines a data-driven threshold algorithm with Fuzzy Particle Swarm Optimization (FPSO) optimisation capabilities. The proposed hybrid method serves multiple objectives, including minimizing the number of selected genes for model training, reducing computational costs, assessing each gene's contribution to the underlying condition, and enhancing classifier performance for improved accuracy. The data-driven threshold algorithm automatically determines an optimal threshold value based on dataset characteristics, addressing the often-challenging task of threshold setting in gene selection. In contrast, FPSO employs a Fuzzy logic approach for parameter settings during its global search and leverages the threshold algorithm's robustness as selection criteria. The synergy between FPSO and the threshold approach forms the core of this method, enabling the simultaneous achievement of multiple objectives, such as minimizing gene count, assessing gene contributions to the disease, reducing computational expenses, and maximizing classifier performance. Compared with existing solutions, experimental evaluations on real microarray datasets demonstrate the superiority of this hybrid approach in terms of gene selection performance and computational efficiency. The selected genes exhibit improved classification accuracy and biological relevance, enhancing their value for downstream analysis tasks. However, it is important to note that the hybrid algorithm faced challenges when dealing with multi-class microarray datasets. Future work will emphasise adapting the method to accommodate the unique characteristics of such datasets.