Digital Communications and Networks (Apr 2024)

Data complexity-based batch sanitization method against poison in distributed learning

  • Silv Wang,
  • Kai Fan,
  • Kuan Zhang,
  • Hui Li,
  • Yintang Yang

Journal volume & issue
Vol. 10, no. 2
pp. 416 – 428

Abstract

Read online

The security of Federated Learning (FL)/Distributed Machine Learning (DML) is gravely threatened by data poisoning attacks, which destroy the usability of the model by contaminating training samples, so such attacks are called causative availability indiscriminate attacks. Facing the problem that existing data sanitization methods are hard to apply to real-time applications due to their tedious process and heavy computations, we propose a new supervised batch detection method for poison, which can fleetly sanitize the training dataset before the local model training. We design a training dataset generation method that helps to enhance accuracy and uses data complexity features to train a detection model, which will be used in an efficient batch hierarchical detection process. Our model stockpiles knowledge about poison, which can be expanded by retraining to adapt to new attacks. Being neither attack-specific nor scenario-specific, our method is applicable to FL/DML or other online or offline scenarios.

Keywords