Intelligent Multi-Lingual Cyber-Hate Detection in Online Social Networks: Taxonomy, Approaches, Datasets, and Open Challenges

Donia Gamal; Marco Alfonse; Salud María Jiménez-Zafra; Mostafa Aref

doi:10.3390/bdcc7020058

Big Data and Cognitive Computing (Mar 2023)

Intelligent Multi-Lingual Cyber-Hate Detection in Online Social Networks: Taxonomy, Approaches, Datasets, and Open Challenges

Donia Gamal,
Marco Alfonse,
Salud María Jiménez-Zafra,
Mostafa Aref

Affiliations

Donia Gamal: Computer Science Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo 11566, Egypt
Marco Alfonse: Computer Science Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo 11566, Egypt
Salud María Jiménez-Zafra: Computer Science Department, SINAI, CEATIC, Universidad de Jaén, 23071 Jaén, Spain
Mostafa Aref: Computer Science Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo 11566, Egypt

DOI: https://doi.org/10.3390/bdcc7020058
Journal volume & issue: Vol. 7, no. 2
p. 58

Abstract

Read online

Sentiment Analysis, also known as opinion mining, is the area of Natural Language Processing that aims to extract human perceptions, thoughts, and beliefs from unstructured textual content. It has become a useful, attractive, and challenging research area concerning the emergence and rise of social media and the mass volume of individuals’ reviews, comments, and feedback. One of the major problems, apparent and evident in social media, is the toxic online textual content. People from diverse cultural backgrounds and beliefs access Internet sites, concealing and disguising their identity under a cloud of anonymity. Due to users’ freedom and anonymity, as well as a lack of regulation governed by social media, cyber toxicity and bullying speech are major issues that need an automated system to be detected and prevented. There is diverse research in different languages and approaches in this area, but the lack of a comprehensive study to investigate them from all aspects is tangible. In this manuscript, a comprehensive multi-lingual and systematic review of cyber-hate sentiment analysis is presented. It states the definition, properties, and taxonomy of cyberbullying and how often each type occurs. In addition, it presents the most recent popular cyberbullying benchmark datasets in different languages, showing their number of classes (Binary/Multiple), discussing the applied algorithms, and how they were evaluated. It also provides the challenges, solutions, as well as future directions.

Published in Big Data and Cognitive Computing

ISSN: 2504-2289 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology
Website: http://www.mdpi.com/journal/BDCC

About the journal

Abstract

Keywords