IEEE Access (Jan 2020)
Detecting Backdoor Attacks via Class Difference in Deep Neural Networks
Abstract
A backdoor attack implies that deep neural networks misrecognize data that have a specific trigger by additionally training the malicious training data, including the specific trigger to the deep neural network model. In this method, the deep neural network correctly recognizes normal data without triggers, but the network misrecognizes data containing a specific trigger as a target class chosen by the attacker. In this paper, I propose a defense method against backdoor attacks using a detection model. This method detects the backdoor sample by comparing the output result of the target model with that of the model that trained the original secure training dataset. This is a defense method without trigger reverse or access to the entire training dataset. As an experimental environment, I used the Tensorflow machine-learning library, MNIST, and Fashion-MNIST as datasets. The results show that when the partial training data for the detection model are 200, the proposed method showed detection rates of 70.1% and 74.4% for the backdoor samples in MNIST and Fashion-MNIST, respectively.
Keywords