IEEE Access (Jan 2024)
Effective Pruning for Top-k Feature Search on the Basis of SHAP Values
Abstract
With the ever-increasing influence of machine learning models, it has become necessary to explain their predictions. The SHAP framework provides a solution to this problem by assigning a score to each feature of a model such that it reflects the feature contribution to the prediction. Although SHAP is widely used, it is hampered by its computational cost when preserving model-agnosticism. This paper proposes a model-agnostic algorithm, TopShap, to efficiently approximate the SHAP values of the top-k most important features. TopShap uses confidence interval bounds of the approximate SHAP values to determine on the fly which features can no longer be part of the top-k and then removes them from the computation, thus saving computational resources. This cost reduction makes TopShap better suited than competing model-agnostic methods for top-k SHAP value computation. The evaluation of TopShap shows that it performs efficient pruning of the feature search space, in turn leading to a substantial reduction in the execution time when compared to the existing most efficient agnostic approach, Kernel SHAP. The experiments presented in this work cover a wide range of numbers of features and instances, using the following public datasets: Concrete, Wine quality, Appliances energy, PBMC gene expression, Mercedes, CT locations, and a synthetic regression. Various models were used to demonstrate model-agnosticism: Regression Forest, Multi-Layer Perceptron, RBF-kernel Support Vector Regression, and Stacked Generalization.
Keywords