Effective Pruning for Top-k Feature Search on the Basis of SHAP Values

Lisa Chabrier; Anton Crombach; Sergio Peignier; Christophe Rigotti

doi:10.1109/ACCESS.2024.3489958

IEEE Access (Jan 2024)

Effective Pruning for Top-k Feature Search on the Basis of SHAP Values

Lisa Chabrier,
Anton Crombach,
Sergio Peignier,
Christophe Rigotti

Affiliations

Lisa Chabrier: ORCiD; Inria, Centre de Lyon, Villeurbanne, France
Anton Crombach: ORCiD; Inria, Centre de Lyon, Villeurbanne, France
Sergio Peignier: ORCiD; INSA Lyon, INRAE, BF2I, UMR0203, Villeurbanne, France
Christophe Rigotti: ORCiD; Inria, Centre de Lyon, Villeurbanne, France

DOI: https://doi.org/10.1109/ACCESS.2024.3489958
Journal volume & issue: Vol. 12
pp. 163079 – 163092

Abstract

Read online

With the ever-increasing influence of machine learning models, it has become necessary to explain their predictions. The SHAP framework provides a solution to this problem by assigning a score to each feature of a model such that it reflects the feature contribution to the prediction. Although SHAP is widely used, it is hampered by its computational cost when preserving model-agnosticism. This paper proposes a model-agnostic algorithm, TopShap, to efficiently approximate the SHAP values of the top-k most important features. TopShap uses confidence interval bounds of the approximate SHAP values to determine on the fly which features can no longer be part of the top-k and then removes them from the computation, thus saving computational resources. This cost reduction makes TopShap better suited than competing model-agnostic methods for top-k SHAP value computation. The evaluation of TopShap shows that it performs efficient pruning of the feature search space, in turn leading to a substantial reduction in the execution time when compared to the existing most efficient agnostic approach, Kernel SHAP. The experiments presented in this work cover a wide range of numbers of features and instances, using the following public datasets: Concrete, Wine quality, Appliances energy, PBMC gene expression, Mercedes, CT locations, and a synthetic regression. Various models were used to demonstrate model-agnosticism: Regression Forest, Multi-Layer Perceptron, RBF-kernel Support Vector Regression, and Stacked Generalization.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords