IEEE Access (Jan 2022)

Intermediate-Layer Transferable Adversarial Attack With DNN Attention

  • Shanshan Yang,
  • Yu Yang,
  • Linna Zhou,
  • Rui Zhan,
  • Yufei Man

DOI
https://doi.org/10.1109/ACCESS.2022.3204696
Journal volume & issue
Vol. 10
pp. 95451 – 95461

Abstract

Read online

The widespread deployment of deep learning models in practice necessitates an assessment of their vulnerability, particularly in security-sensitive areas. As a result, transfer-based adversarial attacks have elicited increasing interest in assessing the security of deep learning models. However, adversarial samples usually exhibit poor transferability over different models because of overfitting of the particular architecture and feature representation of a source model. To address this problem, the Intermediate Layer Attack with Attention guidance (IAA) is proposed to alleviate overfitting and enhance the black-box transferability. The IAA works on an intermediate layer $l$ of the source model. Guided by the model’s attention (i.e., gradients) to the features of layer $l$ , the attack algorithm seeks and undermines the key features that are likely to be adopted by diverse architectures. Significantly, IAA focuses on improving existing white-box attacks without introducing significant visual perceptual quality degradation. Namely, IAA maintains the white-box attack performance of the original algorithm while significantly enhancing its black-box transferability. Extensive experiments on ImageNet classifiers confirmed the effectiveness of our method. The proposed IAA outperformed all state-of-the-art benchmarks in various white-box and black-box settings, i.e., improving the success rate of BIM by 29.65% against normally trained models and 27.16% against defense models.

Keywords