IEEE Access (Jan 2021)
Feature Selection Methods Based on Symmetric Uncertainty Coefficients and Independent Classification Information
Abstract
Feature selection is a critical step in the data preprocessing phase in the field of pattern recognition and machine learning. The core of feature selection is to analyze and quantify the relevance, irrelevance, and redundancy between features and class labels. While existing feature selection methods give multiple explanations for these relationships, they ignore the multi-value bias of class-independent features and the redundancy of class dependent features. Therefore, a feature selection method (Maximal independent classification information and minimal redundancy, MICIMR) is proposed in this paper. Firstly, the relevance and redundancy terms of class independent characteristics are calculated respectively based on the symmetric uncertainty coefficient. Secondly, it calculates the relevance and redundancy terms of class-dependent features according to the independent classification information criterion. Finally, the selection criteria for these two characteristics are combined. To verify the effectiveness of the MICIMR algorithm, five feature selection methods are compared with the MICIMR algorithm on fifteen real datasets. The experimental results demonstrate that the MICIMR algorithm outperforms the other feature selection algorithms in terms of redundancy rate as well as classification accuracy (Gmean_macro and F1_macro).
Keywords