SICE Journal of Control, Measurement, and System Integration (Sep 2019)
Proposal and Evaluation of Detour Path Suppression Method in PS Reinforcement Learning
Abstract
Profit sharing (PS) is well known as a kind of reinforcement learning. In a PS method, a reward is generally distributed with a geometrically decreasing function, and the common ratio of the function is called a discount rate. A large discount rate increases the learning speed, but a non-optimal policy may be learned. On the other hand, a small discount rate improves the performance of the policy, but the learning may not proceed smoothly because of the shallow learning depth. In this paper, in order to cope with these problems, we propose a method that reinforces both the detour path and the non-detour path with different discount rates. Finally, this method is applied to a maze problem and an altruistic multi-agent environment to confirm its effectiveness.
Keywords