iScience (Sep 2022)
Calculation of exact Shapley values for support vector machines with Tanimoto kernel enables model interpretation
Abstract
Summary: The support vector machine (SVM) algorithm is popular in chemistry and drug discovery. SVM models have black box character. Their predictions can be interpreted through feature weighting or the model-agnostic Shapley additive explanations (SHAP) formalism that locally approximates Shapley values (SVs) originating from game theory. We introduce an algorithm termed SV-expressed Tanimoto similarity (SVETA) for the exact calculation of SVs to explain SVM models employing the Tanimoto kernel, the gold standard for the assessment of molecular similarity. For a model system, the exact calculation of SVs is demonstrated. In an SVM-based compound classification task from drug discovery, only a limited correlation between exact SV and SHAP values is observed, prohibiting the use of approximate values for rationalizing predictions. For exemplary test compounds, atom-based mapping of prioritized features delineates coherent substructures that closely resemble those obtained by analyzing independently derived random forest models, thus providing consistent explanations.