Unveiling Deception in Arabic: Optimization of Deceptive Text Detection Across Formal and Informal Genres

Fatimah Alhayan; Hanen T. Himdi; Basma Alharbi

doi:10.1109/access.2024.3424531

IEEE Access (Jan 2024)

Unveiling Deception in Arabic: Optimization of Deceptive Text Detection Across Formal and Informal Genres

Fatimah Alhayan,
Hanen T. Himdi,
Basma Alharbi

Affiliations

Fatimah Alhayan: ORCiD; Department of Information Systems, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, P.O. Box 84428, Riyadh, Saudi Arabia
Hanen T. Himdi: ORCiD; Computer Science and Artificial Intelligence Department, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
Basma Alharbi: ORCiD; Computer Science and Artificial Intelligence Department, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia

DOI: https://doi.org/10.1109/access.2024.3424531
Journal volume & issue: Vol. 12
pp. 94216 – 94230

Abstract

Read online

In recent years, social media has significantly influenced how we share information and exchange messages. However, a significant issue arises from the fast dissemination of deceptive information portrayed as legitimate, which may seriously affect both people and society. Identifying unmonitored ‘deceptive text’ has become a crucial concern in mainstream media due to its potentially damaging impact. Although there have been recent studies that have developed AI models capable of identifying deceptive text in other languages, there is a scarcity of research focused on detecting detective text specifically in the Arabic language. This paper presents a novel Arabic deceptive text detection dataset constructed from publicly available resources. The dataset offers a unique distinction between formal and informal text genres, reflecting the diverse communication styles encountered in real-world deceptive language. We evaluate the performance of various machine learning (ML), deep learning (DL), and transformer-based models on this dataset for classifying text as deceptive or non-deceptive. The study investigates the impact of incorporating additional textual features including morphological features, psycholinguistic features, and sociolinguistic features alongside the raw text data. Our findings demonstrate that the AraBERTv2 model, after fine-tuning the Arabic dataset and incorporating textual features, achieves the best classification performance. This research contributes a valuable resource for Arabic deceptive text analysis and highlights the effectiveness of fine-tuned AraBERTv2 models with enriched features for such tasks.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords