Detecting Fake Reviews Using BERT and Sublinear_TF Methods on Hotel Reviews in the Lombok Tourism Area

Zulpan Hadi; M. Zulpahmi; Zaenudin .; Akmaludin Asrory

doi:10.30871/jaic.v8i2.8721

Journal of Applied Informatics and Computing (Nov 2024)

Detecting Fake Reviews Using BERT and Sublinear_TF Methods on Hotel Reviews in the Lombok Tourism Area

Zulpan Hadi,
M. Zulpahmi,
Zaenudin .,
Akmaludin Asrory

Affiliations

Zulpan Hadi: Universitas Teknologi Mataram
M. Zulpahmi: Universitas Teknologi Mataram
Zaenudin .: Universitas Teknologi Mataram
Akmaludin Asrory: Universitas Teknologi Mataram

DOI: https://doi.org/10.30871/jaic.v8i2.8721
Journal volume & issue: Vol. 8, no. 2
pp. 550 – 556

Abstract

Read online

The number of visitors to Lombok, one of the famous tourist destinations in Indonesia, increased from 400,595 in 2020 to 1,376,295 in 2022. Although the government supports the hotel industry, fake reviews are a significant problem that can damage hotel reputations and mislead tourists. This study uses BERT and Sublinear_TF feature extraction techniques to analyze fake reviews from three main areas: Gili Trawangan, Senggigi, and Kuta. BERT detects fake reviews by understanding the context of words, while Sublinear_TF emphasizes more informative words by reducing the weight of irrelevant common words. The results showed that the more extensive and diverse dataset from Gili Trawangan had the best classification results. The combination of BERT and Random Forest achieved the highest accuracy of 0.84. Overall, BERT excels in Gili Trawangan with an accuracy of 0.79 for SVM and 0.84 for Random Forest. In contrast, smaller and more homogeneous datasets such as Senggigi and Kuta have lower accuracy. In addition, Sublinear_TF performed well on Gili Trawangan with an accuracy of 0.82 using SVM and 0.83 using Random Forest; however, its performance declined in Senggigi and Kuta. BERT and Sublinear_TF techniques are more effective on large and diverse datasets such as Gili Trawangan. Sublinear_TF is better for varied data but less effective on more homogeneous datasets, while BERT with Random Forest showed the highest accuracy due to its ability to capture broader language context. This suggests that the size and variety of the dataset highly influence the success of fake review classification techniques.

Published in Journal of Applied Informatics and Computing

ISSN: 2548-6861 (Online)
Publisher: Politeknik Negeri Batam
Country of publisher: Indonesia
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://jurnal.polibatam.ac.id/index.php/JAIC

About the journal

Abstract

Keywords