Textual Backdoor Defense via Poisoned Sample Recognition

Kun Shao; Yu Zhang; Junan Yang; Hui Liu

doi:10.3390/app11219938

Applied Sciences (Oct 2021)

Textual Backdoor Defense via Poisoned Sample Recognition

Kun Shao,
Yu Zhang,
Junan Yang,
Hui Liu

Affiliations

Kun Shao: Institute of Electronic Countermeasure, National University of Defense Technology, Hefei 230037, China
Yu Zhang: Institute of Electronic Countermeasure, National University of Defense Technology, Hefei 230037, China
Junan Yang: Institute of Electronic Countermeasure, National University of Defense Technology, Hefei 230037, China
Hui Liu: Institute of Electronic Countermeasure, National University of Defense Technology, Hefei 230037, China

DOI: https://doi.org/10.3390/app11219938
Journal volume & issue: Vol. 11, no. 21
p. 9938

Abstract

Read online

Deep learning models are vulnerable to backdoor attacks. The success rate of textual backdoor attacks based on data poisoning in existing research is as high as 100%. In order to enhance the natural language processing model’s defense against backdoor attacks, we propose a textual backdoor defense method via poisoned sample recognition. Our method consists of two parts: the first step is to add a controlled noise layer after the model embedding layer, and to train a preliminary model with incomplete or no backdoor embedding, which reduces the effectiveness of poisoned samples. Then, we use the model to initially identify the poisoned samples in the training set so as to narrow the search range of the poisoned samples. The second step uses all the training data to train an infection model embedded in the backdoor, which is used to reclassify the samples selected in the first step, and finally identify the poisoned samples. Through detailed experiments, we have proved that our defense method can effectively defend against a variety of backdoor attacks (character-level, word-level and sentence-level backdoor attacks), and the experimental effect is better than the baseline method. For the BERT model trained by the IMDB dataset, this method can even reduce the success rate of word-level backdoor attacks to 0%.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords