Offline Multi-Policy Gradient for Latent Mixture Environments

Xiaoguang Li; Xin Zhang; Lixin Wang; Ge Yu

doi:10.1109/ACCESS.2020.3045300

IEEE Access (Jan 2021)

Offline Multi-Policy Gradient for Latent Mixture Environments

Xiaoguang Li,
Xin Zhang,
Lixin Wang,
Ge Yu

Affiliations

Xiaoguang Li: ORCiD; College of Information, Liaoning University, Shenyang, China
Xin Zhang: ORCiD; College of Information, Liaoning University, Shenyang, China
Lixin Wang: ORCiD; College of Information, Liaoning University, Shenyang, China
Ge Yu: ORCiD; College of Computer Science and Engineering, Northeastern University (China), Shenyang, China

DOI: https://doi.org/10.1109/ACCESS.2020.3045300
Journal volume & issue: Vol. 9
pp. 801 – 812

Abstract

Read online

Reinforcement learning has been widely applied for sequential decision making problems in various fields of the real world, including recommendation, e-learning, etc. The features of multi-policy, latent mixture environments and offline learning implied by many real applications bring a new challenge for reinforcement learning. To this challenge, the paper proposes a reinforcement learning approach called offline multi-policy gradient for latent mixture environments. The proposed method uses an objective of expected return of trajectory with respect to the joint distribution of trajectory and model, and adopts a multi-policy searching algorithm to find the optimal policies based on expectation maximization. We also prove that the off-policy technique of importance sampling and advantage function can be used by offline multi-policy learning with fixed historical trajectories. The effectiveness of our approach is demonstrated by the experiments on both synthetic and real datasets.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords