Transition Based Discount Factor for Model Free Algorithms in Reinforcement Learning

Abhinav Sharma; Ruchir Gupta; K. Lakshmanan; Atul Gupta

doi:10.3390/sym13071197

Symmetry (Jul 2021)

Transition Based Discount Factor for Model Free Algorithms in Reinforcement Learning

Abhinav Sharma,
Ruchir Gupta,
K. Lakshmanan,
Atul Gupta

Affiliations

Abhinav Sharma: Department of Computer Science, PDPM Indian Institute of Information Technology Jabalpur, Madhya Pradesh 482005, India
Ruchir Gupta: Department of Computer Science, JNU, Delhi 110001, India
K. Lakshmanan: Department of Computer Science, IIT (BHU) Varanasi, Uttar Pradesh 221005, India
Atul Gupta: Department of Computer Science, PDPM Indian Institute of Information Technology Jabalpur, Madhya Pradesh 482005, India

DOI: https://doi.org/10.3390/sym13071197
Journal volume & issue: Vol. 13, no. 7
p. 1197

Abstract

Read online

Reinforcement Learning (RL) enables an agent to learn control policies for achieving its long-term goals. One key parameter of RL algorithms is a discount factor that scales down future cost in the state’s current value estimate. This study introduces and analyses a transition-based discount factor in two model-free reinforcement learning algorithms: Q-learning and SARSA, and shows their convergence using the theory of stochastic approximation for finite state and action spaces. This causes an asymmetric discounting, favouring some transitions over others, which allows (1) faster convergence than constant discount factor variant of these algorithms, which is demonstrated by experiments on the Taxi domain and MountainCar environments; (2) provides better control over the RL agents to learn risk-averse or risk-taking policy, as demonstrated in a Cliff Walking experiment.

Published in Symmetry

ISSN: 2073-8994 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/symmetry/

About the journal

Abstract

Keywords