IEEE Access (Jan 2024)

Unveiling Deception in Arabic: Optimization of Deceptive Text Detection Across Formal and Informal Genres

  • Fatimah Alhayan,
  • Hanen T. Himdi,
  • Basma Alharbi

DOI
https://doi.org/10.1109/ACCESS.2024.3424531
Journal volume & issue
Vol. 12
pp. 94216 – 94230

Abstract

Read online

In recent years, social media has significantly influenced how we share information and exchange messages. However, a significant issue arises from the fast dissemination of deceptive information portrayed as legitimate, which may seriously affect both people and society. Identifying unmonitored ‘deceptive text’ has become a crucial concern in mainstream media due to its potentially damaging impact. Although there have been recent studies that have developed AI models capable of identifying deceptive text in other languages, there is a scarcity of research focused on detecting detective text specifically in the Arabic language. This paper presents a novel Arabic deceptive text detection dataset constructed from publicly available resources. The dataset offers a unique distinction between formal and informal text genres, reflecting the diverse communication styles encountered in real-world deceptive language. We evaluate the performance of various machine learning (ML), deep learning (DL), and transformer-based models on this dataset for classifying text as deceptive or non-deceptive. The study investigates the impact of incorporating additional textual features including morphological features, psycholinguistic features, and sociolinguistic features alongside the raw text data. Our findings demonstrate that the AraBERTv2 model, after fine-tuning the Arabic dataset and incorporating textual features, achieves the best classification performance. This research contributes a valuable resource for Arabic deceptive text analysis and highlights the effectiveness of fine-tuned AraBERTv2 models with enriched features for such tasks.

Keywords