Intermediate-Layer Transferable Adversarial Attack With DNN Attention

Shanshan Yang; Yu Yang; Linna Zhou; Rui Zhan; Yufei Man

doi:10.1109/ACCESS.2022.3204696

IEEE Access (Jan 2022)

Intermediate-Layer Transferable Adversarial Attack With DNN Attention

Shanshan Yang,
Yu Yang,
Linna Zhou,
Rui Zhan,
Yufei Man

Affiliations

Shanshan Yang: ORCiD; School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China
Yu Yang: ORCiD; School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China
Linna Zhou: School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China
Rui Zhan: School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China
Yufei Man: School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China

DOI: https://doi.org/10.1109/ACCESS.2022.3204696
Journal volume & issue: Vol. 10
pp. 95451 – 95461

Abstract

Read online

The widespread deployment of deep learning models in practice necessitates an assessment of their vulnerability, particularly in security-sensitive areas. As a result, transfer-based adversarial attacks have elicited increasing interest in assessing the security of deep learning models. However, adversarial samples usually exhibit poor transferability over different models because of overfitting of the particular architecture and feature representation of a source model. To address this problem, the Intermediate Layer Attack with Attention guidance (IAA) is proposed to alleviate overfitting and enhance the black-box transferability. The IAA works on an intermediate layer $l$ of the source model. Guided by the model’s attention (i.e., gradients) to the features of layer $l$ , the attack algorithm seeks and undermines the key features that are likely to be adopted by diverse architectures. Significantly, IAA focuses on improving existing white-box attacks without introducing significant visual perceptual quality degradation. Namely, IAA maintains the white-box attack performance of the original algorithm while significantly enhancing its black-box transferability. Extensive experiments on ImageNet classifiers confirmed the effectiveness of our method. The proposed IAA outperformed all state-of-the-art benchmarks in various white-box and black-box settings, i.e., improving the success rate of BIM by 29.65% against normally trained models and 27.16% against defense models.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords