Information (Jul 2024)
DIPA: Adversarial Attack on DNNs by Dropping Information and Pixel-Level Attack on Attention
Abstract
Deep neural networks (DNNs) have shown remarkable performance across a wide range of fields, including image recognition, natural language processing, and speech processing. However, recent studies indicate that DNNs are highly vulnerable to well-crafted adversarial samples, which can cause incorrect classifications and predictions. These samples are so similar to the original ones that they are nearly undetectable by human vision, posing a significant security risk to DNNs in the real world due to the impact of adversarial attacks. Currently, the most common adversarial attack methods explicitly add adversarial perturbations to image samples, often resulting in adversarial samples that are easier to distinguish by humans. To address this issue, we are motivated to develop more effective methods for generating adversarial samples that remain undetectable to human vision. This paper proposes a pixel-level adversarial attack method based on attention mechanism and high-frequency information separation, named DIPA. Specifically, our approach involves constructing an attention suppression loss function and utilizing gradient information to identify and perturb sensitive pixels. By suppressing the model’s attention to the correct classes, the neural network is misled to focus on irrelevant classes, leading to incorrect judgments. Unlike previous studies, DIPA enhances the attack of adversarial samples by separating the imperceptible details in image samples to more effectively hide the adversarial perturbation while ensuring a higher attack success rate. Our experimental results demonstrate that under the extreme single-pixel attack scenario, DIPA achieves higher attack success rates for neural network models with various architectures. Furthermore, the visualization results and quantitative metrics illustrate that the DIPA can generate more imperceptible adversarial perturbation.
Keywords