IEEE Access (Jan 2021)

Empirical Comparison of the Feature Evaluation Methods Based on Statistical Measures

  • Adam Lysiak,
  • Miroslaw Szmajda

DOI
https://doi.org/10.1109/ACCESS.2021.3058428
Journal volume & issue
Vol. 9
pp. 27868 – 27883

Abstract

Read online

One of the most important classification problems is selecting proper features, i.e. features that describe the classified object in the most straightforward way possible. Then, one of the biggest challenges of the feature selection is the evaluation of the feature’s quality. There is a plethora of feature evaluation methods in the literature. This paper presents the results of a comparison between nine selected feature evaluation methods, both existing in literature and newly defined. To make a comparison, features from ten various sets were evaluated by every method. Then, from every feature set, best subset (according to each method) was chosen. Those subsets then were used to train a set of classifiers (including decision trees and forests, linear discriminant analysis, naive Bayes, support vector machines, k nearest neighbors and an artificial neural network). The maximum accuracy of those classifiers, as well as the standard deviation between their accuracies, were used as a quality measures of each particular method. Furthermore, it was determined, which method is the most universal in terms of the data set, i.e. for which method, obtained accuracies were dependent on the feature set the least. Finally, computation time of each method was compared. Results indicated that for applications with limited computational power, method based on the average overlap between feature’s values seem best suited. It led to high accuracies and proved to be fast to compute. However, if the data set is known to be normally distributed, method based on two-sample ${t}$ -test may be preferable.

Keywords