IEEE Access (Jan 2023)
Objective Weight Interval Estimation Using Adversarial Inverse Reinforcement Learning
Abstract
Several real-world problems are modeled as multi-objective sequential decision-making problems with multiple competing objectives, and multi-objective reinforcement learning (MORL) has garnered attention as a solution to this problem. One of the challenges in obtaining the desired policy using MORL is that the priorities (hereafter, weights) for each objective must be designed in advance to scalarize the reward vector. Determining weights through trial-and-error burdens system designers, and methods to estimate weights are needed. The existing methods use inverse reinforcement learning (IRL), which is not scalable because it requires reinforcement learning several times until an optimal policy is obtained. This study proposes a weight interval estimation (WInter) method using adversarial IRL (AIRL). AIRL is a scalable framework that reduces the computational complexity of IRL by simultaneously estimating rewards and policies. WInter estimates the weight interval using the expert neighborhoods obtained during AIRL training. We successfully estimated the weight interval through experiments in a benchmark environment for multi-objective sequential decision-making problems in a continuous state space while reducing computational complexity compared to the existing methods.
Keywords