AI (Jun 2024)

Minimally Distorted Adversarial Images with a Step-Adaptive Iterative Fast Gradient Sign Method

  • Ning Ding,
  • Knut Möller

DOI
https://doi.org/10.3390/ai5020046
Journal volume & issue
Vol. 5, no. 2
pp. 922 – 937

Abstract

Read online

The safety and robustness of convolutional neural networks (CNNs) have raised increasing concerns, especially in safety-critical areas, such as medical applications. Although CNNs are efficient in image classification, their predictions are often sensitive to minor, for human observers, invisible modifications of the image. Thus, a modified, corrupted image can be visually equal to the legitimate image for humans but fool the CNN and make a wrong prediction. Such modified images are called adversarial images throughout this paper. A popular method to generate adversarial images is backpropagating the loss gradient to modify the input image. Usually, only the direction of the gradient and a given step size were used to determine the perturbations (FGSM, fast gradient sign method), or the FGSM is applied multiple times to craft stronger perturbations that change the model classification (i-FGSM). On the contrary, if the step size is too large, the minimum perturbation of the image may be missed during the gradient search. To seek exact and minimal input images for a classification change, in this paper, we suggest starting the FGSM with a small step size and adapting the step size with iterations. A few decay algorithms were taken from the literature for comparison with a novel approach based on an index tracking the loss status. In total, three tracking functions were applied for comparison. The experiments show our loss adaptive decay algorithms could find adversaries with more than a 90% success rate while generating fewer perturbations to fool the CNNs.

Keywords