Journal of King Saud University: Computer and Information Sciences (Jan 2022)
An ensemble approach for spam detection in Arabic opinion texts
Abstract
Nowadays, individuals express experiences and opinions through online reviews. These has an influence on online marketing and obtaining real knowledge about products and services. However, some of the online reviews can be unreal. They may have been written to promote low-quality products/services or sabotage a product/service reputation to mislead potential customers. Such misleading reviews are known as spam reviews and require crucial attention. Prior spam detection research focused on English reviews, with less attention to other languages. The detection of spam reviews in Arabic online sources is a relatively new topic despite the relatively huge amount of data generated. Therefore, this paper contributes to such topic by presenting four different Arabic spam reviews detection methods, while putting more focus towards the construction and evaluation of an ensemble approach. The proposed ensemble method is based on integrating a rule-based classifier with machine learning techniques, while utilizing content-based features that depend on N-gram features and Negation handling. The four proposed methods are evaluated on two datasets of different sizes. The results indicate the efficiency of the ensemble approach where it achieves a classification accuracy of 95.25% and 99.98% for the two experimented datasets and outperforming existing related work by far of 25%.