Shedding Light on the Dark Web: Authorship Attribution in Radical Forums

Leonardo Ranaldi; Federico Ranaldi; Francesca Fallucchi; Fabio Massimo Zanzotto

doi:10.3390/info13090435

Information (Sep 2022)

Shedding Light on the Dark Web: Authorship Attribution in Radical Forums

Leonardo Ranaldi,
Federico Ranaldi,
Francesca Fallucchi,
Fabio Massimo Zanzotto

Affiliations

Leonardo Ranaldi: Department of Innovation and Information Engineering, Guglielmo Marconi University, 00193 Rome, Italy
Federico Ranaldi: Department of Enterprise Engineering, University of Rome Tor Vergata, 00133 Rome, Italy
Francesca Fallucchi: Department of Innovation and Information Engineering, Guglielmo Marconi University, 00193 Rome, Italy
Fabio Massimo Zanzotto: Department of Enterprise Engineering, University of Rome Tor Vergata, 00133 Rome, Italy

DOI: https://doi.org/10.3390/info13090435
Journal volume & issue: Vol. 13, no. 9
p. 435

Abstract

Read online

Online users tend to hide their real identities by adopting different names on the Internet. On Facebook or LinkedIn, for example, people usually appear with their real names. On other standard websites, such as forums, people often use nicknames to protect their real identities. Aliases are used when users are trying to protect their anonymity. This can be a challenge to law enforcement trying to identify users who often change nicknames. In unmonitored contexts, such as the dark web, users expect strong identity protection. Thus, without censorship, these users may create parallel social networks where they can engage in potentially malicious activities that could pose security threats. In this paper, we propose a solution to the need to recognize people who anonymize themselves behind nicknames—the authorship attribution (AA) task—in the challenging context of the dark web: specifically, an English-language Islamic forum dedicated to discussions of issues related to the Islamic world and Islam, in which members of radical Islamic groups are present. We provide extensive analysis by testing models based on transformers, styles, and syntactic features. Downstream of the experiments, we show how models that analyze syntax and style perform better than pre-trained universal language models.

Published in Information

ISSN: 2078-2489 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/information/

About the journal

Abstract

Keywords