Mathematics (Jan 2024)

A Sampling-Based Method for Detecting Data Poisoning Attacks in Recommendation Systems

  • Mohan Li,
  • Yuxin Lian,
  • Jinpeng Zhu,
  • Jingyi Lin,
  • Jiawen Wan,
  • Yanbin Sun

DOI
https://doi.org/10.3390/math12020247
Journal volume & issue
Vol. 12, no. 2
p. 247

Abstract

Read online

The recommendation algorithm based on collaborative filtering is vulnerable to data poisoning attacks, wherein attackers can manipulate system output by injecting a large volume of fake rating data. To address this issue, it is essential to investigate methods for detecting systematically injected poisoning data within the rating matrix. Since attackers often inject a significant quantity of poisoning data in a short period to achieve their desired impact, these data may exhibit spatial proximity. In other words, poisoning data may be concentrated in adjacent rows of the rating matrix. This paper capitalizes on the proximity characteristics of poisoning data in the rating matrix and introduces a sampling-based method for detecting data poisoning attacks. First, we designed a rating matrix sampling method specifically for detecting poisoning data. By sampling differences obtained from the original rating matrix, it is possible to infer the presence of poisoning attacks and effectively discard poisoning data. Second, we developed a method for pinpointing malicious data based on the distance of rating vectors. Through distance calculations, we can accurately identify the positions of malicious data. After that, we validated the method on three real-world datasets. The results demonstrate the effectiveness of our method in identifying malicious data within the rating matrix.

Keywords