IEEE Access (Jan 2021)

Multiple Filter-Based Rankers to Guide Hybrid Grasshopper Optimization Algorithm and Simulated Annealing for Feature Selection With High Dimensional Multi-Class Imbalanced Datasets

  • Abdulrauf Garba Sharifai,
  • Zurinahni Binti Zainol

DOI
https://doi.org/10.1109/ACCESS.2021.3081366
Journal volume & issue
Vol. 9
pp. 74127 – 74142

Abstract

Read online

DNA microarray data analysis is infamous due to a massive number of features, imbalanced class distribution, and limited available samples. In this paper, we focus on high-dimensional multi-class imbalanced problems. The high dimensional and multi-class imbalanced problem has posed acute challenges for the conventional classifiers to effectively perform classification tasks on both the minority and majority classes. Numerous efforts have been devoted to addressing either high dimensionality dataset or class imbalance problems. Nonetheless, few methods have been proposed to address the intersection of multi-class imbalanced and high-dimensional problems concurrently due to their intricate interactions. This paper presents novel hybrid algorithms for feature selection with the high dimensional multi-class imbalanced problem using multiple filter-based rankers (MFR) and hybrid Grasshopper optimization algorithm (GOA). The Simulated Annealing (SA) algorithm is incorporated into GOA. SA is used to enhance the best solution found by the GOA algorithm. The aim of using the SA here is to tackle the slow convergence and improve the exploitation by searching the high-quality regions found by the GOA. The experimental results confirm the effectiveness of the proposed methods in improving the classification performance in terms of area under the curve (AUC) compared to other well-known methods, which guarantees the ability of the proposed methods in searching the feature space and identifying very robust and discriminative features that best predict the minority class.

Keywords