Patterns (May 2022)

Disrupting adversarial transferability in deep neural networks

  • Christopher Wiedeman,
  • Ge Wang

Journal volume & issue
Vol. 3, no. 5
p. 100472

Abstract

Read online

Summary: Adversarial attack transferability is well recognized in deep learning. Previous work has partially explained transferability by recognizing common adversarial subspaces and correlations between decision boundaries, but little is known beyond that. We propose that transferability between seemingly different models is due to a high linear correlation between the feature sets that different networks extract. In other words, two models trained on the same task that are distant in the parameter space likely extract features in the same fashion, linked by trivial affine transformations between the latent spaces. Furthermore, we show how applying a feature correlation loss, which decorrelates the extracted features in corresponding latent spaces, can reduce the transferability of adversarial attacks between models, suggesting that the models complete tasks in semantically different ways. Finally, we propose a dual-neck autoencoder (DNA), which leverages this feature correlation loss to create two meaningfully different encodings of input information with reduced transferability. The bigger picture: Recently, data-driven methods (especially deep learning) have impacted many fields, but these methods suffer from adversarial instability: models that perform well on test samples are easily fooled by adding small but specific noise patterns to these same samples. Furthermore, different models trained for the same task will often be fooled by the same noise patterns (attack transferability). Thus, even ensembles can be easily attacked.Our findings suggest that attack transferability is due to a high correlation in the way information is extracted by different models. We also show that breaking this correlation reduces the transferability of attacks. Robustness against attackers is necessary for entrusting AI with life-sensitive tasks, such as medical diagnosis. By exploring transferability, our work helps explain the nature of adversarial attacks and offers a defense direction via decorrelated ensembles of models.

Keywords