Fake News Spreaders Detection: Sometimes Attention Is Not All You Need

Marco Siino; Elisa Di Nuovo; Ilenia Tinnirello; Marco La Cascia

doi:10.3390/info13090426

Information (Sep 2022)

Fake News Spreaders Detection: Sometimes Attention Is Not All You Need

Marco Siino,
Elisa Di Nuovo,
Ilenia Tinnirello,
Marco La Cascia

Affiliations

Marco Siino: Department of Engineering, University of Palermo, 90128 Palermo, PA, Italy
Elisa Di Nuovo: Dipartimento di Lingue e Letterature Straniere e Culture Moderne, University of Turin, 10124 Torino, TO, Italy
Ilenia Tinnirello: Department of Engineering, University of Palermo, 90128 Palermo, PA, Italy
Marco La Cascia: Department of Engineering, University of Palermo, 90128 Palermo, PA, Italy

DOI: https://doi.org/10.3390/info13090426
Journal volume & issue: Vol. 13, no. 9
p. 426

Abstract

Read online

Guided by a corpus linguistics approach, in this article we present a comparative evaluation of State-of-the-Art (SotA) models, with a special focus on Transformers, to address the task of Fake News Spreaders (i.e., users that share Fake News) detection. First, we explore the reference multilingual dataset for the considered task, exploiting corpus linguistics techniques, such as chi-square test, keywords and Word Sketch. Second, we perform experiments on several models for Natural Language Processing. Third, we perform a comparative evaluation using the most recent Transformer-based models (RoBERTa, DistilBERT, BERT, XLNet, ELECTRA, Longformer) and other deep and non-deep SotA models (CNN, MultiCNN, Bayes, SVM). The CNN tested outperforms all the models tested and, to the best of our knowledge, any existing approach on the same dataset. Fourth, to better understand this result, we conduct a post-hoc analysis as an attempt to investigate the behaviour of the presented best performing black-box model. This study highlights the importance of choosing a suitable classifier given the specific task. To make an educated decision, we propose the use of corpus linguistics techniques. Our results suggest that large pre-trained deep models like Transformers are not necessarily the first choice when addressing a text classification task as the one presented in this article. All the code developed to run our tests is publicly available on GitHub.

Published in Information

ISSN: 2078-2489 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/information/

About the journal

Abstract

Keywords