IEEE Access (Jan 2020)
Deep Learning Based Robust Text Classification Method via Virtual Adversarial Training
Abstract
The existing methods of generating adversarial texts usually change the original meanings of texts significantly and even generate the unreadable texts. These less readable adversarial texts can misclassify the machine classifier successfully, but they cannot deceive the human observers very well. In this paper, we propose a novel method that generates readable adversarial texts with some perturbations that can also confuse human observers successfully. Based on the continuous bag-of-words (CBOW) model, the proposed method looks for the appropriate perturbations to generate the adversarial texts through controlling the perturbation direction vectors. Meanwhile, we apply adversarial training to regularize the classification model and extend it to semi-supervised tasks with virtual adversarial training. Experiments are conducted to show that the generated adversaries are interpretable and confused to humans and the virtual adversarial training effectively improves the robustness of the model.
Keywords