Applied Bionics and Biomechanics (Jan 2021)
Development of Integrated Neural Network Model for Identification of Fake Reviews in E-Commerce Using Multidomain Datasets
Abstract
Online product reviews play a major role in the success or failure of an E-commerce business. Before procuring products or services, the shoppers usually go through the online reviews posted by previous customers to get recommendations of the details of products and make purchasing decisions. Nevertheless, it is possible to enhance or hamper specific E-business products by posting fake reviews, which can be written by persons called fraudsters. These reviews can cause financial loss to E-commerce businesses and misguide consumers to take the wrong decision to search for alternative products. Thus, developing a fake review detection system is ultimately required for E-commerce business. The proposed methodology has used four standard fake review datasets of multidomains include hotels, restaurants, Yelp, and Amazon. Further, preprocessing methods such as stopword removal, punctuation removal, and tokenization have performed as well as padding sequence method for making the input sequence has fixed length during training, validation, and testing the model. As this methodology uses different sizes of datasets, various input word-embedding matrices of n-gram features of the review’s text are developed and created with help of word-embedding layer that is one component of the proposed model. Convolutional and max-pooling layers of the CNN technique are implemented for dimensionality reduction and feature extraction, respectively. Based on gate mechanisms, the LSTM layer is combined with the CNN technique for learning and handling the contextual information of n-gram features of the review’s text. Finally, a sigmoid activation function as the last layer of the proposed model receives the input sequences from the previous layer and performs binary classification task of review text into fake or truthful. In this paper, the proposed CNN-LSTM model was evaluated in two types of experiments, in-domain and cross-domain experiments. For an in-domain experiment, the model is applied on each dataset individually, while in the case of a cross-domain experiment, all datasets are gathered and put into a single data frame and evaluated entirely. The testing results of the model in-domain experiment datasets were 77%, 85%, 86%, and 87% in the terms of accuracy for restaurant, hotel, Yelp, and Amazon datasets, respectively. Concerning the cross-domain experiment, the proposed model has attained 89% accuracy. Furthermore, comparative analysis of the results of in-domain experiments with existing approaches has been done based on accuracy metric and, it is observed that the proposed model outperformed the compared methods.