IEEE Access (Jan 2024)

A Replaceable Curiosity-Driven Candidate Agent Exploration Approach for Task-Oriented Dialog Policy Learning

  • Xuecheng Niu,
  • Akinori Ito,
  • Takashi Nose

DOI
https://doi.org/10.1109/ACCESS.2024.3462719
Journal volume & issue
Vol. 12
pp. 142640 – 142650

Abstract

Read online

Task-oriented dialog policy learning is often formulated as a Reinforcement Learning problem whose rewards from the environment are extremely sparse, which means that the agent will often not find the reward by acting randomly. Thus, exploration techniques are of primary importance when solving RL problems, and more sophisticated exploration methods must be devised. In this study, we propose a replaceable curiosity-driven candidate agent exploration approach to encourage the agent to balance action sampling and explore new environments without overly violating dialog strategies. In this framework, we follow the employment of the curiosity model but design weight for the curiosity reward to balance exploration and exploitation. We designed a multi-candidate agent mechanism to filter an agent with relatively balanced action sampling for formal dialog training to motivate agents to escape pseudo-optimal actions in the early training stage. In addition, we propose a replacement mechanism for the first time to prevent the elected agents from performing poorly in the later stages of training and to fully utilize all the candidate agents. The experimental results show that the adjustable curiosity reward promotes dialog policy convergence. The agent replacement mechanism effectively blocks the training of poorly trained agents, significantly increasing the task’s average success rate and reducing the number of dialog turns. In this research, an exploration approach for task-oriented dialog system is designed to encourage agents to explore environment through balanced action sampling, without significantly deviating from learned dialog strategies. Compared to baselines, the replaceable curiosity-driven candidate agent exploration approach yields a higher average success rate of 0.714 and a lower number of average turns of 20.6.

Keywords