Disrupting adversarial transferability in deep neural networks

Christopher Wiedeman; Ge Wang

Patterns (May 2022)

Disrupting adversarial transferability in deep neural networks

Christopher Wiedeman,
Ge Wang

Affiliations

Christopher Wiedeman: Rensselaer Polytechnic Institute, Department of Electrical and Computer Systems Engineering, Troy, NY, USA
Ge Wang: Rensselaer Polytechnic Institute, Department of Biomedical Engineering, Troy, NY, USA; Corresponding author

Journal volume & issue: Vol. 3, no. 5
p. 100472

Abstract

Read online

Summary: Adversarial attack transferability is well recognized in deep learning. Previous work has partially explained transferability by recognizing common adversarial subspaces and correlations between decision boundaries, but little is known beyond that. We propose that transferability between seemingly different models is due to a high linear correlation between the feature sets that different networks extract. In other words, two models trained on the same task that are distant in the parameter space likely extract features in the same fashion, linked by trivial affine transformations between the latent spaces. Furthermore, we show how applying a feature correlation loss, which decorrelates the extracted features in corresponding latent spaces, can reduce the transferability of adversarial attacks between models, suggesting that the models complete tasks in semantically different ways. Finally, we propose a dual-neck autoencoder (DNA), which leverages this feature correlation loss to create two meaningfully different encodings of input information with reduced transferability. The bigger picture: Recently, data-driven methods (especially deep learning) have impacted many fields, but these methods suffer from adversarial instability: models that perform well on test samples are easily fooled by adding small but specific noise patterns to these same samples. Furthermore, different models trained for the same task will often be fooled by the same noise patterns (attack transferability). Thus, even ensembles can be easily attacked.Our findings suggest that attack transferability is due to a high correlation in the way information is extracted by different models. We also show that breaking this correlation reduces the transferability of attacks. Robustness against attackers is necessary for entrusting AI with life-sensitive tasks, such as medical diagnosis. By exploring transferability, our work helps explain the nature of adversarial attacks and offers a defense direction via decorrelated ensembles of models.

DSML 2: Proof-of-Concept: Data science output has been formulated, implemented, and tested for one domain/problem

Published in Patterns

ISSN: 2666-3899 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://www.cell.com/patterns

About the journal

Abstract

Keywords