Multi-Agent Reinforcement Learning with Optimal Equivalent Action of Neighborhood
Haixing Wang,
Yi Yang,
Zhiwei Lin,
Tian Wang
Affiliations
Haixing Wang
Henan Key Laboratory of Intelligent Detection and Control of Coal Mine Equipment, School of Electrical Engineering and Automation, Henan Polytechnic University, Shiji Road, Jiaozuo 454003, China
Yi Yang
Henan Key Laboratory of Intelligent Detection and Control of Coal Mine Equipment, School of Electrical Engineering and Automation, Henan Polytechnic University, Shiji Road, Jiaozuo 454003, China
Zhiwei Lin
School of Mathematics and Physics, Queen’s University Belfast, University Road, 10587, Belfast BT7 1NN, UK
Tian Wang
Institute of Artificial Intelligence, Beihang University, Xueyuan Road, Beijing 100083, China
In a multi-agent system, the complex interaction among agents is one of the difficulties in making the optimal decision. This paper proposes a new action value function and a learning mechanism based on the optimal equivalent action of the neighborhood (OEAN) of a multi-agent system, in order to obtain the optimal decision from the agents. In the new Q-value function, the OEAN is used to depict the equivalent interaction between the current agent and the others. To deal with the non-stationary environment when agents act, the OEAN of the current agent is inferred simultaneously by the maximum a posteriori based on the hidden Markov random field model. The convergence property of the proposed methodology proved that the Q-value function can approach the global Nash equilibrium value using the iteration mechanism. The effectiveness of the method is verified by the case study of the top-coal caving. The experiment results show that the OEAN can reduce the complexity of the agents’ interaction description, meanwhile, the top-coal caving performance can be improved significantly.