Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki (Jun 2024)
An approach to detecting L0 -optimized attacks on image processing neural networks via means of mathematical statistics
Abstract
Artificial intelligence has become widespread in image processing tasks. At the same time, the number of vulnerabilities is increasing in systems implementing these artificial intelligence technologies (the attack surface is increasing). The main threats to information security can be implemented by introducing malicious perturbations into the input data, regardless of their type. To detect such attacks, approaches and methods have been developed based, in particular, on the use of an auto-encoder or the analysis of layers of the target neural network. The disadvantage of existing methods, which significantly reduce the scope of their application, is binding to the dataset or model architecture. This paper discusses the issues of expanding the scope (increasing scalability) of methods for detecting L0-optimized perturbations introduced by unconventional pixel attacks. An approach to detecting these attacks using statistical analysis of input data, regardless of the model and dataset, is proposed. It is assumed that the pixels of the perturbation embedded in the image, as a result of the L0-optimized attack, will be considered both local and global outliers. Outlier detection is performed using statistical metrics such as deviation from nearest neighbors and Mahalanobis distance. The evaluation of each pixel (anomaly score) is performed as a product of the specified metrics. A threshold clipping algorithm is used to detect an attack. When a pixel is detected for which the received score exceeds a certain threshold, the image is recognized as distorted. The approach was tested on the CIFAR-10 and MNIST datasets. The developed method has demonstrated high accuracy in detecting attacks. On the CIFAR-10 dataset, the accuracy of detecting onepixel attack (accuracy) was 94.3 %, and when detecting a Jacobian based Saliency Map Attack (JSMA) — 98.3 %. The proposed approach is also applicable in the detection of modified pixels. The proposed approach is applicable for detecting one-pixel attacks and JSMA, but can potentially be used for any L0-optimized distortions. The approach is applicable for color and grayscale images regardless of the dataset. The proposed approach is potentially universal for the architecture of a neural network, since it uses only input data to detect attacks. The approach can be used to detect images modified by unconventional adversarial attacks in the training sample before the model is formed.
Keywords