Detecting Islamic Radicalism Arabic Tweets Using Natural Language Processing

Khalid T. Mursi; Mohammad D. Alahmadi; Faisal S. Alsubaei; Ahmed S. Alghamdi

doi:10.1109/ACCESS.2022.3188688

IEEE Access (Jan 2022)

Detecting Islamic Radicalism Arabic Tweets Using Natural Language Processing

Khalid T. Mursi,
Mohammad D. Alahmadi,
Faisal S. Alsubaei,
Ahmed S. Alghamdi

Affiliations

Khalid T. Mursi: ORCiD; Department of Cybersecurity, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
Mohammad D. Alahmadi: ORCiD; Department of Software Engineering, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
Faisal S. Alsubaei: ORCiD; Department of Cybersecurity, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
Ahmed S. Alghamdi: ORCiD; Department of Cybersecurity, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia

DOI: https://doi.org/10.1109/ACCESS.2022.3188688
Journal volume & issue: Vol. 10
pp. 72526 – 72534

Abstract

Read online

The image of the tolerant religion of Islam has been distorted by extremists in the last two decades in many ways, such as luring teenagers into terrorist acts. Nowadays, millions of users socialize and share ideas using social media platforms such as Twitter. Typically, the ideas shared on Twitter (tweets) reach and influence many people who could simply retweet them and make them even spread faster. Unfortunately, some of these ideas are posted by extremists who share hateful Arabic content. Thus, it is very important to automate the process of controlling and monitoring hateful Arabic tweets, given that Arabic is the most widely used language in the Islamic world. In this paper, we provide a manually labeled and curated dataset of 3,000 Arabic tweets that contain hateful and non-hateful tweets. To automate the process of detecting hateful tweets, we utilize advanced Machine Learning (ML) techniques and perform sentiment analysis to capture the meaning of the Arabic words in a proper word embedding (Word2Vec). Also, we used the proposed model to classify and analyze 100,000 tweets of the last decade. The outcome of this work promotes future research on analyzing Arabic hateful speech by providing a manually labeled Arabic dataset, and the trained model (achieved 92% accuracy) which can be used as an underlying tool by governments, Internet service providers, and social media applications to detect any inflammatory tweets before they spread to a wider audience.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords