网络与信息安全学报 (Dec 2023)
Visual explanation method for reversible neural networks
Abstract
The issue of model explainability has gained significant attention in understanding the vulnerabilities and anonymous decision-making processes inherent in deep neural networks (DNN).While there has been considerable research on explainability for traditional DNN, there is a lack of exploration on the operation mechanism and explainability of reversible neural networks (RevNN).Additionally, the existing explanation methods for traditional DNN are not suitable for RevNN and suffer from issues such as excessive noise and gradient saturation.To address these limitations, a visual explanation method called visual explanation method for reversible neural network (VERN) was proposed for RevNN.VERN leverages the reversible property of RevNN and is based on the class-activation mapping mechanism.The correspondence between the feature map and the input image was explored by VERN, allowing for the mapping of classification weights of regional feature maps to the corresponding regions of the input image.The importance of each region for model decision-making was revealed through this process, which generates a basis for model decision-making.Experimental comparisons with other explanation methods on generalized datasets demonstrate that VERN achieves a more focused visual effect, surpassing suboptimal methods with up to 7.80% improvement in average drop (AD) metrics and up to 6.05% improvement in average increase (AI) metrics in recognition tasks.VERN also exhibits an 82.00% level of localization for the maximum point of the heat value.Furthermore, VERN can be applied to explain traditional DNN and exhibits good scalability, improving the performance of other methods in explaining RevNN.Furthermore, through adversarial attack analysis experiments, it is observed that adversarial attacks alter the decision basis of the model.This is reflected in the misalignment of the model’s attention regions, thereby aiding in the exploration of the operation mechanism of adversarial attacks.