Jisuanji kexue (Aug 2022)

Class Discriminative Universal Adversarial Attack for Text Classification

  • HAO Zhi-rong, CHEN Long, HUANG Jia-cheng

DOI
https://doi.org/10.11896/jsjkx.220200077
Journal volume & issue
Vol. 49, no. 8
pp. 323 – 329

Abstract

Read online

The definition of universal adversarial attack is that the text classifiers can be successfully fooled by a fixed sequence of perturbations appended to any inputs.But textual examples from all classes are indiscriminately attacked by the existing UAA,which is easy to attract the attention of the defense system.For more stealth attack,a simple and efficient class discriminative universal adversarial attack method is proposed,which has an obvious attack effect on textual examples from the targeted classes and limited influence on the non-targeted classes.In the case of white-box attack,multiple candidate perturbation sequences are searched by using the average gradient of the perturbation sequence in each batch.The perturbation sequence with the smallest loss is selected for the next iteration until no new perturbation sequence is generated.Comprehensive experiments are conducted on four public Chinese and English datasets and TextCNN,BiLSTM to evaluate the effectiveness of the proposed method.Experimental results show that the proposed attack method can discriminatively attack the targeted and non-targeted classes,and has certain transferability.

Keywords