Hidden Markov Random Field for Multi-Agent Optimal Decision in Top-Coal Caving

Yi Yang; Zhiwei Lin; Bingfeng Li; Xinwei Li; Lizhi Cui; Keping Wang

doi:10.1109/ACCESS.2020.2984786

IEEE Access (Jan 2020)

Hidden Markov Random Field for Multi-Agent Optimal Decision in Top-Coal Caving

Yi Yang,
Zhiwei Lin,
Bingfeng Li,
Xinwei Li,
Lizhi Cui,
Keping Wang

Affiliations

Yi Yang: ORCiD; School of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo, China
Zhiwei Lin: ORCiD; School of Computing, Ulster University, Newtownabbey, U.K.
Bingfeng Li: ORCiD; School of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo, China
Xinwei Li: ORCiD; School of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo, China
Lizhi Cui: ORCiD; School of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo, China
Keping Wang: ORCiD; School of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo, China

DOI: https://doi.org/10.1109/ACCESS.2020.2984786
Journal volume & issue: Vol. 8
pp. 76596 – 76609

Abstract

Read online

Applying model-based learning for the optimal decision of the multi-agent system is not trivial due to the expensive price or even the impossibility of obtaining the ground truth for training the model of the complex environment. Such as learning the optimal action of hydraulic supports in the top-coal caving, the optimal action could not accessible as the ground truth of the corresponding state in the intricate processes. Regarding the latent ground truth as the hidden variable is an effective method in the hidden Markov model. This paper extends the hidden variable of ground truth to the multi-agent system and proposes the hidden Markov random field (HMRF) model with reinforcement learning for optimizing the action decision of the multi-agent. In the HMRF model, the input states and the output actions of the multi-agent are considered as an observable random field and a latent Markov random field, respectively. Based on the HMRF model, the optimal decision is inferred by the maximum posterior probability with the prior probability obtained by Q-learning. Meanwhile, the parameters of the HMRF model are estimated by the expectation maximum algorithm. In the experiment, the top-coal caving demonstrates the effectiveness of the proposed method that the recall of top-coal is improved prominently with a very small price of increasing the rock-rate. Furthermore, the proposed method is employed to deal with the predator-preys problem in the gym. The experiment result shows that the communication between agents by the HMRF increases the reward of the preys.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords