OLF-ML: An Offensive Language Framework for Detection, Categorization, and Offense Target Identification Using Text Processing and Machine Learning Algorithms

MD. Nahid Hasan; Kazi Shadman Sakib; Taghrid Tahani Preeti; Jeza Allohibi; Abdulmajeed Atiah Alharbi; Jia Uddin

doi:10.3390/math12132123

Mathematics (Jul 2024)

OLF-ML: An Offensive Language Framework for Detection, Categorization, and Offense Target Identification Using Text Processing and Machine Learning Algorithms

MD. Nahid Hasan,
Kazi Shadman Sakib,
Taghrid Tahani Preeti,
Jeza Allohibi,
Abdulmajeed Atiah Alharbi,
Jia Uddin

Affiliations

MD. Nahid Hasan: Department of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka 1212, Bangladesh
Kazi Shadman Sakib: Department of Computer Science and Engineering, University of Dhaka, Dhaka 1000, Bangladesh
Taghrid Tahani Preeti: Department of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka 1212, Bangladesh
Jeza Allohibi: Department of Mathematics, Taibah University, Madinah 42353, Saudi Arabia
Abdulmajeed Atiah Alharbi: Department of Mathematics, Taibah University, Madinah 42353, Saudi Arabia
Jia Uddin: Artificial Intelligence and Big Data Department, Endicott College, Woosong University, Daejeon 34606, Republic of Korea

DOI: https://doi.org/10.3390/math12132123
Journal volume & issue: Vol. 12, no. 13
p. 2123

Abstract

Read online

The pervasiveness of offensive language on social media emphasizes the necessity of automated systems for identifying and categorizing content. To ensure a more secure online environment and improve communication, effective identification and categorization of this content is essential. However, existing research encounters challenges such as limited datasets and biased model performance, hindering progress in this domain. To address these challenges, this research presents a comprehensive framework that simplifies the utilization of support vector machines (SVM), random forest (RF) and artificial neural networks (ANN). The proposed methodology yields notable gains in offensive language detection, automatic categorization of offensiveness, and offense target identification tasks by utilizing the Offensive Language Identification Dataset (OLID). The simulation results indicate that SVM performs exceptionally well, exhibiting excellent accuracy scores (77%, 88%, and 68%), precision scores (76%, 87%, and 67%), F1 scores (57%, 88%, and 68%), and recall rates (45%, 88%, and 68%), proving to be practically successful in identifying and moderating offensive content on social media. By applying sophisticated preprocessing and meticulous hyperparameter tuning, our model outperforms some earlier research in detecting and categorizing offensive language tasks.

Published in Mathematics

ISSN: 2227-7390 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/mathematics

About the journal

Abstract

Keywords