Journal of King Saud University: Computer and Information Sciences (Sep 2023)
An adversarial training method for text classification
Abstract
Text classification is an emerging topic in the field of text data mining, but the current methods of deducing sentence polarity have two major shortcomings: on the one hand, there is currently a lack of a large and well-curated corpus; on the other hand, current solutions based on deep learning are particularly vulnerable to attacks from adversarial samples. To overcome the limitations above, we propose an adversarial training method HNN-GRAT (Hierarchical Neural Network and Gradient Reversal) for text classification. Firstly, A Robustly Optimized BERT Pretraining Approach (RoBERTa) pretraining model is used to extract text features and feature gradient information; secondly, the original gradient information is passed through the gradient reversal layer designed to obtain the inverted gradient information; finally, the original gradient information and the inverted gradient information are fused to obtain the new gradient of the model. HNN-GRAT method are tested on three real datasets and five attack methods, compared with RoBERTa pretraining model, HNN-GRAT improves the robustness accuracy and reduces the probability of the model being attacked. In addition, using six text defense methods, HNN-GRAT achieves the best Boa and Succ (Such as, DeepWordBug attack, for AGNEWS, IMDB, SST-2 datasets, with an improvement of Boa up to 41.50%, 67.50%, 28.15% and Succ drop to 55.90%, 27.45%, 69.89%, respectively).