IEEE Access (Jan 2024)
Unveiling Deception in Arabic: Optimization of Deceptive Text Detection Across Formal and Informal Genres
Abstract
In recent years, social media has significantly influenced how we share information and exchange messages. However, a significant issue arises from the fast dissemination of deceptive information portrayed as legitimate, which may seriously affect both people and society. Identifying unmonitored ‘deceptive text’ has become a crucial concern in mainstream media due to its potentially damaging impact. Although there have been recent studies that have developed AI models capable of identifying deceptive text in other languages, there is a scarcity of research focused on detecting detective text specifically in the Arabic language. This paper presents a novel Arabic deceptive text detection dataset constructed from publicly available resources. The dataset offers a unique distinction between formal and informal text genres, reflecting the diverse communication styles encountered in real-world deceptive language. We evaluate the performance of various machine learning (ML), deep learning (DL), and transformer-based models on this dataset for classifying text as deceptive or non-deceptive. The study investigates the impact of incorporating additional textual features including morphological features, psycholinguistic features, and sociolinguistic features alongside the raw text data. Our findings demonstrate that the AraBERTv2 model, after fine-tuning the Arabic dataset and incorporating textual features, achieves the best classification performance. This research contributes a valuable resource for Arabic deceptive text analysis and highlights the effectiveness of fine-tuned AraBERTv2 models with enriched features for such tasks.
Keywords