IEEE Access (Jan 2024)
Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation
Abstract
Authorship Verification (AV) is a text classification task concerned with inferring whether a candidate text has been written by one specific author (A) or by someone else ( $\overline {A}$ ). Itehas been shown that many AV systems are vulnerable to adversarial attacks, where a malicious author actively tries to fool the classifier by either concealing their writing style, oreby imitating the style of another author. Inethis paper, weeinvestigate the potential benefits of augmenting the classifier training set with (negative) synthetic examples. These synthetic examples are generated to imitate the style of A. Weeanalyze the improvements in the classifier predictions that this augmentation brings to bear in the task of AV in an adversarial setting. Ineparticular, weeexperiment with three different generator architectures (one based on Recurrent Neural Networks, another based on small-scale transformers, and another based on the popular GPT model) and with two training strategies (one inspired by standard Language Models, and another inspired by Wasserstein Generative Adversarial Networks). Weeevaluate our hypothesis on five datasets (three of which have been specifically collected to represent an adversarial setting) and using two learning algorithms for the AV classifier (Support Vector Machines and Convolutional Neural Networks). This experimentation yields negative results, revealing that, although our methodology proves effective in many adversarial settings, its benefits are too sporadic for a pragmatical application.
Keywords