IEEE Access (Jan 2021)

Offline Multi-Policy Gradient for Latent Mixture Environments

  • Xiaoguang Li,
  • Xin Zhang,
  • Lixin Wang,
  • Ge Yu

DOI
https://doi.org/10.1109/ACCESS.2020.3045300
Journal volume & issue
Vol. 9
pp. 801 – 812

Abstract

Read online

Reinforcement learning has been widely applied for sequential decision making problems in various fields of the real world, including recommendation, e-learning, etc. The features of multi-policy, latent mixture environments and offline learning implied by many real applications bring a new challenge for reinforcement learning. To this challenge, the paper proposes a reinforcement learning approach called offline multi-policy gradient for latent mixture environments. The proposed method uses an objective of expected return of trajectory with respect to the joint distribution of trajectory and model, and adopts a multi-policy searching algorithm to find the optimal policies based on expectation maximization. We also prove that the off-policy technique of importance sampling and advantage function can be used by offline multi-policy learning with fixed historical trajectories. The effectiveness of our approach is demonstrated by the experiments on both synthetic and real datasets.

Keywords