Mask-Attention A3C: Visual Explanation of Action&#x2013;State Value in Deep Reinforcement Learning

Hidenori Itaya; Tsubasa Hirakawa; Takayoshi Yamashita; Hironobu Fujiyoshi; Komei Sugiura

doi:10.1109/ACCESS.2024.3416179

IEEE Access (Jan 2024)

Mask-Attention A3C: Visual Explanation of Action–State Value in Deep Reinforcement Learning

Hidenori Itaya,
Tsubasa Hirakawa,
Takayoshi Yamashita,
Hironobu Fujiyoshi,
Komei Sugiura

Affiliations

Hidenori Itaya: ORCiD; Department of Computer Science, Chubu University, Kasugai-shi, Aichi, Japan
Tsubasa Hirakawa: ORCiD; Center for Mathematical Science and Artificial Intelligence, Chubu University, Kasugai-shi, Aichi, Japan
Takayoshi Yamashita: ORCiD; Center for Mathematical Science and Artificial Intelligence, Chubu University, Kasugai-shi, Aichi, Japan
Hironobu Fujiyoshi: ORCiD; Department of Robotics, Chubu University, Kasugai-shi, Aichi, Japan
Komei Sugiura: ORCiD; Department of Computer Science, Keio University, Yokohama, Kanagawa, Japan

DOI: https://doi.org/10.1109/ACCESS.2024.3416179
Journal volume & issue: Vol. 12
pp. 86553 – 86571

Abstract

Read online

Deep reinforcement learning (DRL) can learn an agent’s optimal behavior from the experience it gains through interacting with its environment. However, since the decision-making process of DRL agents is a black-box, it is difficult for users to understand the reasons for the agents’ actions. To date, conventional visual explanation methods for DRL agents have focused only on the policy and not on the state value. In this work, we propose a DRL method called Mask-Attention A3C (Mask A3C) to analyze agents’ decision-making by focusing on both the policy and value branches, which have different outputs. Inspired by the Actor-Critic method, our method introduces an Attention mechanism that applies mask processing to the feature map of the policy and value branches using mask-attention, which is a heat-map representation of the basis for judging the policy and state values. We also propose the introduction of a Mask-attention Loss to obtain highly interpretable mask-attention. By introducing this loss function, the agent learns not to gaze at regions that do not affect its decision-making. Our evaluations with Atari 2600 as a video game strategy task and robot manipulation as a robot control task showed that visualizing the mask-attention of an agent during its action selection facilitates the analysis of the agent’s decision-making. We also investigated the effect of Mask-attention Loss and confirmed that it is useful for analyzing agents’ decision-making. In addition, we showed that these mask-attentions are highly interpretable to the user by conducting a user survey on the prediction of the agent’s behavior.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords