IEEE Access (Jan 2020)
A Solution to the High-Dimensional Classification Problem Using an Improved Hybrid Feature Selection Algorithm Guided by Interaction Information
Abstract
This paper addresses the high-dimensional classification problem, which is very important in machine learning. When the number of features of the data is very high, the classification performance of a given classifier can degrade because there are not enough samples for training. One of the solutions to cope with this problem is to perform feature selection to reduce the number of features. We propose a new hybrid feature selection algorithm based on interaction information that improves upon the previous one. Our improved method employs interaction information to select candidate features to be added to the current feature subset. Cohen's d is used as the significance testing to decide whether a new feature is permanently added to the subset. We adopt new stopping criteria to allow intensive search. Our search method is efficient and is able to find excellent solutions. Experiments results on eleven high-dimensional data sets show that compared to other hybrid feature selection algorithms, our proposed algorithm provides high classification accuracy and requires a small number of features for classification.
Keywords