IEEE Access (Jan 2022)

A Meta-Analysis Survey on the Usage of Meta-Heuristic Algorithms for Feature Selection on High-Dimensional Datasets

  • Li Yu Yab,
  • Noorhaniza Wahid,
  • Rahayu A. Hamid

DOI
https://doi.org/10.1109/ACCESS.2022.3221194
Journal volume & issue
Vol. 10
pp. 122832 – 122856

Abstract

Read online

Feature selection (FS) using meta-heuristic algorithms on high-dimensional datasets (HDD) is becoming more prevalent due to the continuous advancement in data mining. However, the difficulty in identifying the threshold of features in a dataset to be categorised as HDD remains an issue due to the different schools of thought on this matter. Therefore, this survey intended to determine the threshold for a number of features to be HDD, and subsequently identify the trend or potential FS method for HDD and the most preferred meta-heuristic algorithms and classifiers for both wrapper-based and filter-based FS methods to analyse HDD. This study performed an extensive systematic literature review by implementing the PRISMA guidelines on 62 research articles that were published between 2016 to 2021. This survey proposed a novel grouping technique called literal grouping and data grouping (LGDG) to accurately group the chosen articles based on HDD. The LGDG method serves as a guide for other researchers who intend to perform FS research related to HDD. Literal grouping refers to searching for selected papers using specific keywords, like HDD in this case. While data grouping compares the number of features in datasets towards the threshold, which is set at 2,000 features by the majority. Based on the analyses of all the LGDG groupings, the filter-based FS method gained more attention in recent years with competent results no less than wrapper-based, especially on HDD. Besides that, Moth Flame Optimisation works well in filter-based methods, whereas Cuckoo Optimisation Algorithm works well in wrapper-based, while Whale Optimisation Algorithm works well in both FS methods. As for the classifier’s preferences, SVM, DT, and NB are preferred by the filter-based, while KNN is preferred by the wrapper-based method. It can be recommended that reviewing other aspects such as multi-objective FS on HDD and including more FS methods could be included in future studies as an extension to this survey.

Keywords