Journal of Big Data (May 2024)

GB-AFS: graph-based automatic feature selection for multi-class classification via Mean Simplified Silhouette

  • David Levin,
  • Gonen Singer

DOI
https://doi.org/10.1186/s40537-024-00934-5
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 22

Abstract

Read online

Abstract This paper introduces a novel graph-based filter method for automatic feature selection (abbreviated as GB-AFS) for multi-class classification tasks. The method determines the minimum combination of features required to sustain prediction performance while maintaining complementary discriminating abilities between different classes. It does not require any user-defined parameters such as the number of features to select. The minimum number of features is selected using our newly developed Mean Simplified Silhouette (abbreviated as MSS) index, designed to evaluate the clustering results for the feature selection task. To illustrate the effectiveness and generality of the method, we applied the GB-AFS method using various combinations of statistical measures and dimensionality reduction techniques. The experimental results demonstrate the superior performance of the proposed GB-AFS over other filter-based techniques and automatic feature selection approaches, and demonstrate that the GB-AFS method is independent of the statistical measure or the dimensionality reduction technique chosen by the user. Moreover, the proposed method maintained the accuracy achieved when utilizing all features while using only 7– $$30\%$$ 30 % of the original features. This resulted in an average time saving ranging from $$15\%$$ 15 % for the smallest dataset to $$70\%$$ 70 % for the largest. Our code is available at https://github.com/davidlevinwork/gbfs/ .

Keywords