Enhancing Machine-Generated Text Detection: Adversarial Fine-Tuning of Pre-Trained Language Models

Dong Hee Lee; Beakcheol Jang

doi:10.1109/ACCESS.2024.3396820

IEEE Access (Jan 2024)

Enhancing Machine-Generated Text Detection: Adversarial Fine-Tuning of Pre-Trained Language Models

Dong Hee Lee,
Beakcheol Jang

Affiliations

Dong Hee Lee: Graduate School of Information, Yonsei University, Seoul, South Korea
Beakcheol Jang: ORCiD; Graduate School of Information, Yonsei University, Seoul, South Korea

DOI: https://doi.org/10.1109/ACCESS.2024.3396820
Journal volume & issue: Vol. 12
pp. 65333 – 65340

Abstract

Read online

Advances in large language models (LLMs) have revolutionized the natural language processing field. However, the text generated by LLMs can result in various issues, such as fake news, misinformation, and social media spam. In addition, detecting machine-generated text is becoming increasingly difficult because it produces text that resembles human writing. We propose a new method for effectively detecting machine-generated text by applying adversarial training (AT) to pre-trained language models (PLMs), such as Bidirectional Encoder Representations from Transformers (BERT). We generated adversarial examples that appeared to have been modified by humans and applied them to the PLMs to improve the model’s detection capabilities. The proposed method was validated on various datasets and experiments. It showed improved performance compared to traditional fine-tuning methods, with an average reduction in the probability of misclassification of machine-generated text by about 10%. We demonstrated the robustness of the model when generated with input tokens of different lengths and under different training data ratios. We suggested future research directions for applying AT to different languages and language model types. This study opens new possibilities for applying AT to the problem of machine-generated text detection and classification and contributes to building more effective detection models.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords