From online hate speech to offline hate crime: the role of inflammatory language in forecasting violence against migrant and LGBT communities

Carlos Arcila Calderón; Patricia Sánchez Holgado; Jesús Gómez; Marcos Barbosa; Haodong Qi; Alberto Matilla; Pilar Amado; Alejandro Guzmán; Daniel López-Matías; Tomás Fernández-Villazala

doi:10.1057/s41599-024-03899-1

Humanities & Social Sciences Communications (Oct 2024)

From online hate speech to offline hate crime: the role of inflammatory language in forecasting violence against migrant and LGBT communities

Carlos Arcila Calderón,
Patricia Sánchez Holgado,
Jesús Gómez,
Marcos Barbosa,
Haodong Qi,
Alberto Matilla,
Pilar Amado,
Alejandro Guzmán,
Daniel López-Matías,
Tomás Fernández-Villazala

Affiliations

Carlos Arcila Calderón: University of Salamanca
Patricia Sánchez Holgado: University of Salamanca
Jesús Gómez: National Office for Combating Hate Crimes, Secretary of State for Security, Ministry of Interior
Marcos Barbosa: University of Salamanca
Haodong Qi: Malmö University
Alberto Matilla: National Office for Combating Hate Crimes, Secretary of State for Security, Ministry of Interior
Pilar Amado: National Office for Combating Hate Crimes, Secretary of State for Security, Ministry of Interior
Alejandro Guzmán: Universidad Autónoma de Madrid
Daniel López-Matías: Universidad Rey Juan Carlos
Tomás Fernández-Villazala: National Office for Combating Hate Crimes, Secretary of State for Security, Ministry of Interior

DOI: https://doi.org/10.1057/s41599-024-03899-1
Journal volume & issue: Vol. 11, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Social media messages often provide insights into offline behaviors. Although hate speech proliferates rapidly across social media platforms, it is rarely recognized as a cybercrime, even when it may be linked to offline hate crimes that typically involve physical violence. This paper aims to anticipate violent acts by analyzing online hate speech (hatred, toxicity, and sentiment) and comparing it to offline hate crime. The dataset for this preregistered study included social media posts from X (previously called Twitter) and Facebook and internal police records of hate crimes reported in Spain between 2016 and 2018. After conducting preliminary data analysis to check the moderate temporal correlation, we used time series analysis to develop computational models (VAR, GLMNet, and XGBTree) to predict four time periods of these rare events on a daily and weekly basis. Forty-eight models were run to forecast two types of offline hate crimes, those against migrants and those against the LGBT community. The best model for migrant crime achieved an R2 of 64%, while that for LGBT crime reached 53%. According to the best ML models, the weekly aggregations outperformed the daily aggregations, the national models outperformed those geolocated in Madrid, and those about migration were more effective than those about LGBT people. Moreover, toxic language outperformed hatred and sentiment analysis, Facebook posts were better predictors than tweets, and in most cases, speech temporally preceded crime. Although we do not make any claims about causation, we conclude that online inflammatory language could be a leading indicator for detecting potential hate crimes acts and that these models can have practical applications for preventing these crimes.

Published in Humanities & Social Sciences Communications

ISSN: 2662-9992 (Online)
Publisher: Springer Nature
Country of publisher: United Kingdom
LCC subjects: General Works: History of scholarship and learning. The humanities; Social Sciences
Website: https://www.nature.com/palcomms/

About the journal