SICE Journal of Control, Measurement, and System Integration (Sep 2019)

Utilizing Observed Information for No-Communication Multi-Agent Reinforcement Learning toward Cooperation in Dynamic Environment

  • Fumito Uwano,
  • Keiki Takadama

DOI
https://doi.org/10.9746/jcmsi.12.199
Journal volume & issue
Vol. 12, no. 5
pp. 199 – 208

Abstract

Read online

This paper proposes a multi-agent reinforcement learning method without communication toward dynamic environments, called profit minimizing reinforcement learning with oblivion of memory (PMRL-OM). PMRL-OM is extended from PMRL and defines a memory range that only utilizes the valuable information from the environment. Since agents do not require information observed before an environmental change, the agents utilize the information acquired after a certain iteration, which is performed by the memory range. In addition, PMRL-OM improves the update function for a goal value as a priority of purpose and updates the goal value based on newer information. To evaluate the effectiveness of PMRL-OM, this study compares PMRL-OM with PMRL in five dynamic maze environments, including state changes for two types of cooperation, position changes for two types of cooperation, and a combined case from these four cases. The experimental results revealed that: (a) PMRL-OM was an effective method for cooperation in all five cases of dynamic environments examined in this study; (b) PMRL-OM was more effective than PMRL was in these dynamic environments; and (c) in a memory range of 100 to 500, PMRL-OM performs well.

Keywords