Interpreting the decisions of CNNs via influence functions

Aisha Aamir; Minija Tamosiunaite; Minija Tamosiunaite; Florentin Wörgötter

doi:10.3389/fncom.2023.1172883

Frontiers in Computational Neuroscience (Jul 2023)

Interpreting the decisions of CNNs via influence functions

Aisha Aamir,
Minija Tamosiunaite,
Minija Tamosiunaite,
Florentin Wörgötter

Affiliations

Aisha Aamir: Third Institute of Physics – Biophysics and Bernstein Center for Computational Neuroscience, University of Göttingen, Göttingen, Germany
Minija Tamosiunaite: Third Institute of Physics – Biophysics and Bernstein Center for Computational Neuroscience, University of Göttingen, Göttingen, Germany
Minija Tamosiunaite: Department of Informatics, Vytautas Magnus University, Kaunas, Lithuania
Florentin Wörgötter: Third Institute of Physics – Biophysics and Bernstein Center for Computational Neuroscience, University of Göttingen, Göttingen, Germany

DOI: https://doi.org/10.3389/fncom.2023.1172883
Journal volume & issue: Vol. 17

Abstract

Read online

An understanding of deep neural network decisions is based on the interpretability of model, which provides explanations that are understandable to human beings and helps avoid biases in model predictions. This study investigates and interprets the model output based on images from the training dataset, i.e., to debug the results of a network model in relation to the training dataset. Our objective was to understand the behavior (specifically, class prediction) of deep learning models through the analysis of perturbations of the loss functions. We calculated influence scores for the VGG16 network at different hidden layers across three types of disturbances in the original images of the ImageNet dataset: texture, style, and background elimination. The global and layer-wise influence scores allowed the identification of the most influential training images for the given testing set. We illustrated our findings using influence scores by highlighting the types of disturbances that bias predictions of the network. According to our results, layer-wise influence analysis pairs well with local interpretability methods such as Shapley values to demonstrate significant differences between disturbed image subgroups. Particularly in an image classification task, our layer-wise interpretability approach plays a pivotal role to identify the classification bias in pre-trained convolutional neural networks, thus, providing useful insights to retrain specific hidden layers.

Published in Frontiers in Computational Neuroscience

ISSN: 1662-5188 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Medicine: Internal medicine: Neurosciences. Biological psychiatry. Neuropsychiatry
Website: http://www.frontiersin.org/computational_neuroscience

About the journal

Abstract

Keywords