Fast Detection of Deceptive Reviews by Combining the Time Series and Machine Learning

Minjuan Zhong; Zhenjin Li; Shengzong Liu; Bo Yang; Rui Tan; Xilong Qu

doi:10.1155/2021/9923374

Complexity (Jan 2021)

Fast Detection of Deceptive Reviews by Combining the Time Series and Machine Learning

Minjuan Zhong,
Zhenjin Li,
Shengzong Liu,
Bo Yang,
Rui Tan,
Xilong Qu

Affiliations

Minjuan Zhong: School of Information Technology, Hunan University of Finance and Economics, Changsha 410205, China
Zhenjin Li: School of Foreign Language, Hunan University of Finance and Economics, Changsha 410205, China
Shengzong Liu: School of Information Technology, Hunan University of Finance and Economics, Changsha 410205, China
Bo Yang: School of Information Technology, Hunan University of Finance and Economics, Changsha 410205, China
Rui Tan: School of Information Management, Jiangxi University of Finance and Economics, Nanchang 330013, China
Xilong Qu: School of Information Technology, Hunan University of Finance and Economics, Changsha 410205, China

DOI: https://doi.org/10.1155/2021/9923374
Journal volume & issue: Vol. 2021

Abstract

Read online

With the rapid growth of online product reviews, many users refer to others’ opinions before deciding to purchase any product. However, unfortunately, this fact has promoted the constant use of fake reviews, resulting in many wrong purchase decisions. The effective identification of deceptive reviews becomes a crucial yet challenging task in this research field. The existing supervised learning methods require a large number of labeled examples of deceptive and truthful opinions by domain experts, while the available unsupervised learning methods are inefficient because they depend on the features of reviewers to detect each fake review. Therefore, by focusing on the detection efficiency problem and the limitation of large amount of labeled examples dependence, in this paper, we proposed an effective semisupervised learning approach for detecting spam reviews. Firstly, a time series model of all the reviews of a product is constructed, and then the suspected time intervals are captured based on the burst review increases in these intervals. Secondly, a co-training two-view semisupervised learning algorithm was performed in each captured interval, in which linguistic cues, metadata, and user purchase behaviors were synthetically employed to classify the reviews and check whether they are spam ones or not. A series of numerical experiments on a real dataset acquired from Taobao.com have confirmed the effectiveness of the proposed model, not only reaping benefits in terms of time efficiency and high accuracy but also overcoming the shortcomings of supervised learning methods, which depend on large amounts of labeled examples. And a trade-off balance was obtained between accuracy and efficiency.

Published in Complexity

ISSN: 1076-2787 (Print); 1099-0526 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://onlinelibrary.wiley.com/journal/8503

About the journal