IEEE Access (Jan 2023)

An Automatic Detection System for Fake Japanese Shopping Sites Using fastText and LightGBM

  • Keisuke Sakai,
  • Kosuke Takeshige,
  • Kazuki Kato,
  • Naoki Kurihara,
  • Katsumi Ono,
  • Masaki Hashimoto

DOI
https://doi.org/10.1109/ACCESS.2023.3323218
Journal volume & issue
Vol. 11
pp. 111389 – 111401

Abstract

Read online

In recent years, the number of fake shopping sites that scam people out of their money or steal their personal information has skyrocketed. To address this problem, Japanese law enforcement agencies such as the police have been detecting fake shopping sites through information provided by a third party and by conducting manual investigations. However, this current approach is quite inefficient. Despite a number of recent studies that use machine learning to detect fake sites, there is still no system for automatically detecting fake shopping sites. Therefore, in this study, we developed an automatic detection system for fake shopping sites to solve the problem of detection inefficiency faced by law enforcement agencies in Japan. The proposed system successfully identified an average of 118,000 target URLs per day from the list of newly registered domains and collected an average of 51,000 sets of HTML data. Also, it was able to determine with 98.5% accuracy using machine learning whether the collected data were fake shopping sites or not. Since this system was able to meet the time requirements for actual operation, we developed an automatic detection system for fake Japanese shopping sites.

Keywords