Machine Learning and Knowledge Extraction (Sep 2021)
A Critical Study on Stability Measures of Feature Selection with a Novel Extension of Lustgarten Index
Abstract
Stability of feature selection algorithm refers to its robustness to the perturbations of the training set, parameter settings or initialization. A stable feature selection algorithm is crucial for identifying the relevant feature subset of meaningful and interpretable features which is extremely important in the task of knowledge discovery. Though there are many stability measures reported in the literature for evaluating the stability of feature selection, none of them follows all the requisite properties of a stability measure. Among them, the Kuncheva index and its modifications, are widely used in practical problems. In this work, the merits and limitations of the Kuncheva index and its existing modifications (Lustgarten, Wald, nPOG/nPOGR, Nogueira) are studied and analysed with respect to the requisite properties of stability measure. One more limitation of the most recent modified similarity measure, Nogueira’s measure, has been pointed out. Finally, corrections to Lustgarten’s measure have been proposed to define a new modified stability measure that satisfies the desired properties and overcomes the limitations of existing popular similarity based stability measures. The effectiveness of the newly modified Lustgarten’s measure has been evaluated with simple toy experiments.
Keywords