IEEE Access (Jan 2024)

Efficient Feature Ranking and Selection Using Statistical Moments

  • Yael Hochma,
  • Yuval Felendler,
  • Mark Last

DOI
https://doi.org/10.1109/ACCESS.2024.3412851
Journal volume & issue
Vol. 12
pp. 105573 – 105587

Abstract

Read online

Unsupervised feature selection methods can be more efficient than supervised methods, which rely on the expensive and time-consuming data labeling process. The paper introduced skewness as a novel, unsupervised, and computationally efficient feature ranking metric, suitable for both classification and regression tasks. Its feature selection effectiveness is compared to several state-of-the-art supervised and unsupervised feature ranking and selection methods. Both theoretical analysis and empirical evaluation on several popular classification and regression algorithms show that statistical moment-based feature selection algorithms are competitive in terms of accuracy and mean squared error (MSE) with the state-of-the-art supervised approaches for feature ranking and selection, including Fast Correlation Based Filter (FCBF), Minimum Redundancy Maximum Relevance (MRMR), and Mutual Information Maximization (MIM). We also present a mathematical proof based on some common assumptions, which explains the high effectiveness of statistical moments in the feature ranking procedure. Moreover, statistical moment-based feature selection is shown empirically to run faster, on average, than the supervised approaches and the unsupervised Laplacian Score method. Additionally, skewness-based feature selection, in contrast to variance-based selection, does not depend on data normalization that requires additional computational time and may affect the feature ranking results.

Keywords