Detecting Backdoor Attacks via Class Difference in Deep Neural Networks

Hyun Kwon

doi:10.1109/ACCESS.2020.3032411

IEEE Access (Jan 2020)

Detecting Backdoor Attacks via Class Difference in Deep Neural Networks

Hyun Kwon

Affiliations

Hyun Kwon: ORCiD; Department of Electrical Engineering, Korea Military Academy, Seoul, South Korea

DOI: https://doi.org/10.1109/ACCESS.2020.3032411
Journal volume & issue: Vol. 8
pp. 191049 – 191056

Abstract

Read online

A backdoor attack implies that deep neural networks misrecognize data that have a specific trigger by additionally training the malicious training data, including the specific trigger to the deep neural network model. In this method, the deep neural network correctly recognizes normal data without triggers, but the network misrecognizes data containing a specific trigger as a target class chosen by the attacker. In this paper, I propose a defense method against backdoor attacks using a detection model. This method detects the backdoor sample by comparing the output result of the target model with that of the model that trained the original secure training dataset. This is a defense method without trigger reverse or access to the entire training dataset. As an experimental environment, I used the Tensorflow machine-learning library, MNIST, and Fashion-MNIST as datasets. The results show that when the partial training data for the detection model are 200, the proposed method showed detection rates of 70.1% and 74.4% for the backdoor samples in MNIST and Fashion-MNIST, respectively.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords