Mathematics (Oct 2020)

Binary Whale Optimization Algorithm for Dimensionality Reduction

  • Abdelazim G. Hussien,
  • Diego Oliva,
  • Essam H. Houssein,
  • Angel A. Juan,
  • Xu Yu

DOI
https://doi.org/10.3390/math8101821
Journal volume & issue
Vol. 8, no. 10
p. 1821

Abstract

Read online

Feature selection (FS) was regarded as a global combinatorial optimization problem. FS is used to simplify and enhance the quality of high-dimensional datasets by selecting prominent features and removing irrelevant and redundant data to provide good classification results. FS aims to reduce the dimensionality and improve the classification accuracy that is generally utilized with great importance in different fields such as pattern classification, data analysis, and data mining applications. The main problem is to find the best subset that contains the representative information of all the data. In order to overcome this problem, two binary variants of the whale optimization algorithm (WOA) are proposed, called bWOA-S and bWOA-V. They are used to decrease the complexity and increase the performance of a system by selecting significant features for classification purposes. The first bWOA-S version uses the Sigmoid transfer function to convert WOA values to binary ones, whereas the second bWOA-V version uses a hyperbolic tangent transfer function. Furthermore, the two binary variants introduced here were compared with three famous and well-known optimization algorithms in this domain, such as Particle Swarm Optimizer (PSO), three variants of binary ant lion (bALO1, bALO2, and bALO3), binary Dragonfly Algorithm (bDA) as well as the original WOA, over 24 benchmark datasets from the UCI repository. Eventually, a non-parametric test called Wilcoxon’s rank-sum was carried out at 5% significance to prove the powerfulness and effectiveness of the two proposed algorithms when compared with other algorithms statistically. The qualitative and quantitative results showed that the two introduced variants in the FS domain are able to minimize the selected feature number as well as maximize the accuracy of the classification within an appropriate time.

Keywords