IEEE Access (Jan 2024)

Feature Selection of Gene Expression Data Using a Modified Artificial Fish Swarm Algorithm With Population Variation

  • Zong-Zheng Li,
  • Fang-Ling Wang,
  • Feng Qin,
  • Yusliza Binti Yusoff,
  • Azlan Mohd Zain

DOI
https://doi.org/10.1109/ACCESS.2024.3402652
Journal volume & issue
Vol. 12
pp. 72688 – 72706

Abstract

Read online

Microarray data is of great significance for cancer identification at the gene level. In the microarray dataset, only a small number of characteristic genomes have significant classification and identification rates for cancer. How to extract a small number of characteristic genes from a large number of microarray data is a classic NP-hard problem. This paper proposes a practical hybrid approach to implement the feature selection of gene expression from the microarray by combining the F-score algorithm and an improved artificial fish swarm algorithm with population variation (FSA-PV). Firstly, the F-score algorithm eliminates a large number of useless and redundant features in the dataset. Then, FSA-PV is discussed to obtain the ability to jump out of the local optimum while retaining the excellent feature of the subset as much as possible, and the adaptive step and visual are used to adjust the search space and to move the range of the algorithm in different environments to improve the local optimization and global optimization abilities. In addition, a naive Bayesian classifier is used to test the classification accuracy of subsets. Eight classical datasets are used to verify the performance of the proposed mechanism in the experiment part. The results reveal that the classification accuracy using the FSA-PV is significant superior to other algorithms in Breast dataset, and the classification accuracy is more than 90% in 8 cases. It further indicates the robustness and feasibility of the FSA-PV in the gene selection process.

Keywords