PDHS: Pattern-Based Deep Hate Speech Detection With Improved Tweet Representation

P. Sharmila; Kalaiarasi Sonai Muthu Anbananthen; Deisy Chelliah; Sudhaman Parthasarathy; Subarmaniam Kannan

doi:10.1109/ACCESS.2022.3210177

IEEE Access (Jan 2022)

PDHS: Pattern-Based Deep Hate Speech Detection With Improved Tweet Representation

P. Sharmila,
Kalaiarasi Sonai Muthu Anbananthen,
Deisy Chelliah,
Sudhaman Parthasarathy,
Subarmaniam Kannan

Affiliations

P. Sharmila: ORCiD; Thiagarajar College of Engineering, Madurai, Tamil Nadu, India
Kalaiarasi Sonai Muthu Anbananthen: ORCiD; Faculty of Information Science and Technology, Multimedia University, Melaka, Malaysia
Deisy Chelliah: Thiagarajar College of Engineering, Madurai, Tamil Nadu, India
Sudhaman Parthasarathy: ORCiD; Thiagarajar College of Engineering, Madurai, Tamil Nadu, India
Subarmaniam Kannan: Faculty of Information Science and Technology, Multimedia University, Melaka, Malaysia

DOI: https://doi.org/10.1109/ACCESS.2022.3210177
Journal volume & issue: Vol. 10
pp. 105366 – 105376

Abstract

Read online

Automatic hate speech identification in unstructured Twitter is significantly more difficult to analyze, posing a significant challenge. Existing models heavily depend on feature engineering, which increases the time complexity of detecting hate speech. This work aims to classify and detect hate speech using a linguistic pattern-based approach as pre-trained transformer language models. As a result, a novel Pattern-based Deep Hate Speech (PDHS) detection model was proposed to detect the presence of hate speech using a cross-attention encoder with a dual-level attention mechanism. Instead of concatenating the features, our model computes dot product attention for better representation by reducing the irrelevant features. The first level of Attention is extracting aspect terms using predefined parts-of-speech tagging. The second level of Attention is extracting the sentiment polarity to form a pattern. Our proposed model trains the extracted patterns with term frequency, parts-of-speech tag, and Sentiment Scores. The experimental results on Twitter Dataset can learn effective features to enhance the performance with minimum training time and attained 88%F1Score.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords