So-haTRed: A Novel Hybrid System for Turkish Hate Speech Detection in Social Media With Ensemble Deep Learning Improved by BERT and Clustered-Graph Networks

Ayse Berna Altinel; Gozde Karatas Baydogmus; Sema Sahin; Mustafa Zahid Gurbuz

doi:10.1109/ACCESS.2024.3415350

IEEE Access (Jan 2024)

So-haTRed: A Novel Hybrid System for Turkish Hate Speech Detection in Social Media With Ensemble Deep Learning Improved by BERT and Clustered-Graph Networks

Ayse Berna Altinel,
Gozde Karatas Baydogmus,
Sema Sahin,
Mustafa Zahid Gurbuz

Affiliations

Ayse Berna Altinel: ORCiD; Department of Computer Engineering, Faculty of Technology, Marmara University, Maltepe, İstanbul, Turkey
Gozde Karatas Baydogmus: ORCiD; Department of Computer Engineering, Faculty of Technology, Marmara University, Maltepe, İstanbul, Turkey
Sema Sahin: ORCiD; Department of Computer Engineering, Faculty of Technology, Marmara University, Maltepe, İstanbul, Turkey
Mustafa Zahid Gurbuz: ORCiD; Department of Computer Engineering, Doğuş University, İstanbul, Turkey

DOI: https://doi.org/10.1109/ACCESS.2024.3415350
Journal volume & issue: Vol. 12
pp. 86252 – 86270

Abstract

Read online

Hate speech on online platforms, characterized by discriminatory language targeting individuals or groups, poses significant harm and necessitates robust detection methods for digital safety. Recognizing the ease with which individuals can engage in such speech online, our study delved into detecting Turkish hate speech using deep learning algorithms and natural language processing techniques. We developed innovative methodologies, including a k-means+textGCN classifier with BERT, which marked the first such attempt in the literature, and explored multiple vector representation techniques such as Term Frequency, Word2Vec, Doc2Vec, and GloVe. Additionally, we investigated various learning algorithms and natural language processing techniques, conducting thorough evaluations on three distinct Turkish hate speech datasets. Notably, our newly presented algorithm exhibited superior performance, achieving an impressive F1-score of 87.81% on the 9K dataset, showcasing advancements in hate speech detection and contributing to a safer online environment.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords