Scientific Reports (Mar 2023)

A new hybrid algorithm for three-stage gene selection based on whale optimization

  • Junjian Liu,
  • Chiwen Qu,
  • Lupeng Zhang,
  • Yifan Tang,
  • Jinlong Li,
  • Huicong Feng,
  • Xiaomin Zeng,
  • Xiaoning Peng

DOI
https://doi.org/10.1038/s41598-023-30862-y
Journal volume & issue
Vol. 13, no. 1
pp. 1 – 12

Abstract

Read online

Abstract In biomedical data mining, the gene dimension is often much larger than the sample size. To solve this problem, we need to use a feature selection algorithm to select feature gene subsets with a strong correlation with phenotype to ensure the accuracy of subsequent analysis. This paper presents a new three-stage hybrid feature gene selection method, that combines a variance filter, extremely randomized tree, and whale optimization algorithm. First, a variance filter is used to reduce the dimension of the feature gene space, and an extremely randomized tree is used to further reduce the feature gene set. Finally, the whale optimization algorithm is used to select the optimal feature gene subset. We evaluate the proposed method with three different classifiers in seven published gene expression profile datasets and compare it with other advanced feature selection algorithms. The results show that the proposed method has significant advantages in a variety of evaluation indicators.