IEEE Access (Jan 2022)
Collection-CAM: A Faster Region-Based Saliency Method Using Collection-Wise Mask Over Pyramidal Features
Abstract
Due to the black-box nature of deep networks, making explanations of their decision-making is extremely challenging. A solution is using post-hoc attention mechanisms with the deep network to verify the decision basis. However, those methods have problems such as gradient noise and false confidence. In addition, existing saliency methods either have limited performance by using only the last convolution layer or suffer from large computational overhead. In this work, we propose the Collection-CAM, which generates an attention map with low computational overhead while utilizing multi-level feature maps. First, the Collection-CAM searches for the most appropriate form of the partition through bottom-up clustering and clustering validation process. Then the Collection-CAM applies different pre-processing procedures on the shallow feature map and final feature map to overcome the false positiveness when applied without distinction. By combining collection-wise masks according to their contribution to the confidence score, the Collection-CAM completes the attention map generation process. Experimental results on ImageNet1k, UC Merced, and CUB dataset and various deep network models demonstrate that the Collection-CAM not only can synthesize a saliency map with a better visual explanation but also requires significantly lower computational overhead compared to those of region-based saliency methods.
Keywords