Evaluation of deep neural network architectures for authorship obfuscation of Portuguese texts

Antônio Marcos Rodrigues Franco; Ítalo Cunha; Leonardo B. Oliveira

Natural Language Processing Journal (Dec 2024)

Evaluation of deep neural network architectures for authorship obfuscation of Portuguese texts

Antônio Marcos Rodrigues Franco,
Ítalo Cunha,
Leonardo B. Oliveira

Affiliations

Antônio Marcos Rodrigues Franco: Universidade Federal de Minas Gerais, Av. Presidente Antonio Carlos, 6627, Belo Horizonte, 31270010, Minas Gerais, Brazil
Ítalo Cunha: Corresponding author.; Universidade Federal de Minas Gerais, Av. Presidente Antonio Carlos, 6627, Belo Horizonte, 31270010, Minas Gerais, Brazil
Leonardo B. Oliveira: Universidade Federal de Minas Gerais, Av. Presidente Antonio Carlos, 6627, Belo Horizonte, 31270010, Minas Gerais, Brazil

Journal volume & issue: Vol. 9
p. 100107

Abstract

Read online

Preserving authorship anonymity is paramount to protect activists, freedom of expression, and critical journalism. Although there are several mechanisms to provide anonymity on the Internet, one can still identify anonymous authors through their writing style. With the advances in neural network and natural language processing research, the success of a classifier when identifying the author of a text is growing. On the other hand, new approaches that use recurrent neural networks for automatic generation of obfuscated texts have also arisen to fight anonymity adversaries. In this work, we evaluate two approaches that use neural networks to generate obfuscated texts. The first approach uses Generative Adversarial Networks to train an encoder–decoder to transform sentences from an input style into a target style. The second one trains an auto encoder with Gradient Reversal Layer to learn invariant representations. In our experiments, we compared the efficiency of both techniques when removing the stylistic attributes of a text and preserving its original semantics. Our evaluation on real texts clarifies each technique’s trade-offs for Portuguese texts and provides guidance on practical deployment.

Published in Natural Language Processing Journal

ISSN: 2949-7191 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing
Website: https://www.sciencedirect.com/journal/natural-language-processing-journal

About the journal

Abstract

Keywords