Addressing religious hate online: from taxonomy creation to automated detection

Alan Ramponi; Benedetta Testa; Sara Tonelli; Elisabetta Jezek

doi:10.7717/peerj-cs.1128

PeerJ Computer Science (Dec 2022)

Addressing religious hate online: from taxonomy creation to automated detection

Alan Ramponi,
Benedetta Testa,
Sara Tonelli,
Elisabetta Jezek

Affiliations

Alan Ramponi: Fondazione Bruno Kessler, Trento, Italy
Benedetta Testa: Dipartimento di Studi Umanistici, Università di Pavia, Pavia, Italy
Sara Tonelli: Fondazione Bruno Kessler, Trento, Italy
Elisabetta Jezek: Dipartimento di Studi Umanistici, Università di Pavia, Pavia, Italy

DOI: https://doi.org/10.7717/peerj-cs.1128
Journal volume & issue: Vol. 8
p. e1128

Abstract

Read online Read online

Abusive language in online social media is a pervasive and harmful phenomenon which calls for automatic computational approaches to be successfully contained. Previous studies have introduced corpora and natural language processing approaches for specific kinds of online abuse, mainly focusing on misogyny and racism. A current underexplored area in this context is religious hate, for which efforts in data and methods to date have been rather scattered. This is exacerbated by different annotation schemes that available datasets use, which inevitably lead to poor repurposing of data in wider contexts. Furthermore, religious hate is very much dependent on country-specific factors, including the presence and visibility of religious minorities, societal issues, historical background, and current political decisions. Motivated by the lack of annotated data specifically tailoring religion and the poor interoperability of current datasets, in this article we propose a fine-grained labeling scheme for religious hate speech detection. Such scheme lies on a wider and highly-interoperable taxonomy of abusive language, and covers the three main monotheistic religions: Judaism, Christianity and Islam. Moreover, we introduce a Twitter dataset in two languages—English and Italian—that has been annotated following the proposed annotation scheme. We experiment with several classification algorithms on the annotated dataset, from traditional machine learning classifiers to recent transformer-based language models, assessing the difficulty of two tasks: abusive language detection and religious hate speech detection. Finally, we investigate the cross-lingual transferability of multilingual models on the tasks, shedding light on the viability of repurposing our dataset for religious hate speech detection on low-resource languages. We release the annotated data and publicly distribute the code for our classification experiments at https://github.com/dhfbk/religious-hate-speech.

Published in PeerJ Computer Science

ISSN: 2376-5992 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://peerj.com/computer-science/

About the journal

Abstract

Keywords