IEEE Access (Jan 2022)

Black-Box Reward Attacks Against Deep Reinforcement Learning Based on Successor Representation

  • Kanting Cai,
  • Xiangbin Zhu,
  • Zhao-Long Hu

DOI
https://doi.org/10.1109/ACCESS.2022.3174963
Journal volume & issue
Vol. 10
pp. 51548 – 51560

Abstract

Read online

Although the deep reinforcement learning (DRL) technology has been widely adopted in various fields, it has become an important research hotspot to study the vulnerability of DRL for improving the robustness of DRL agents. The adversarial attack methods based on white-box models, where the adversary can access all the information of victims, have been intensively investigated. However, in most practical situations, the adversary cannot obtain the internal information of the victim’s neural network. Furthermore, for reward-based attacks, the agent can perform anomaly detection on the perturbed rewards to detect whether it has been attacked. In this paper, we propose a black-box attack method with corrupted rewards, which employs DRL exploration mechanisms to improve the effectiveness of attacking agents. The adversary builds a deep neural network in advance to learn the successor representation (SR) of each state. Then, the adversary can determine the timing of attacks and generate imperceptible adversarial perturbations based on the values of the SR. Experimental results show that the black-box attack algorithm based on SR proposed in this paper can effectively attack agents with fewer adversarial samples.

Keywords