Sensors (May 2022)
Clustering Approach for Detecting Multiple Types of Adversarial Examples
Abstract
With intentional feature perturbations to a deep learning model, the adversary generates an adversarial example to deceive the deep learning model. As an adversarial example has recently been considered in the most severe problem of deep learning technology, its defense methods have been actively studied. Such effective defense methods against adversarial examples are categorized into one of the three architectures: (1) model retraining architecture; (2) input transformation architecture; and (3) adversarial example detection architecture. Especially, defense methods using adversarial example detection architecture have been actively studied. This is because defense methods using adversarial example detection architecture do not make wrong decisions for the legitimate input data while others do. In this paper, we note that current defense methods using adversarial example detection architecture can classify the input data into only either a legitimate one or an adversarial one. That is, the current defense methods using adversarial example detection architecture can only detect the adversarial examples and cannot classify the input data into multiple classes of data, i.e., legitimate input data and various types of adversarial examples. To classify the input data into multiple classes of data while increasing the accuracy of the clustering model, we propose an advanced defense method using adversarial example detection architecture, which extracts the key features from the input data and feeds the extracted features into a clustering model. From the experimental results under various application datasets, we show that the proposed method can detect the adversarial examples while classifying the types of adversarial examples. We also show that the accuracy of the proposed method outperforms the accuracy of recent defense methods using adversarial example detection architecture.
Keywords