Distributed Off-Policy Temporal Difference Learning Using Primal-Dual Method

Donghwan Lee; Do Wan Kim; Jianghai Hu

doi:10.1109/ACCESS.2022.3211395

IEEE Access (Jan 2022)

Distributed Off-Policy Temporal Difference Learning Using Primal-Dual Method

Donghwan Lee,
Do Wan Kim,
Jianghai Hu

Affiliations

Donghwan Lee: ORCiD; Department of Electrical Engineering, KAIST, Daejeon, South Korea
Do Wan Kim: ORCiD; Department of Electrical Engineering, Hanbat National University, Daejeon, South Korea
Jianghai Hu: ORCiD; Department of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA

DOI: https://doi.org/10.1109/ACCESS.2022.3211395
Journal volume & issue: Vol. 10
pp. 107077 – 107094

Abstract

Read online

The goal of this paper is to provide theoretical analysis and additional insights on a distributed temporal-difference (TD)-learning algorithm for the multi-agent Markov decision processes (MDPs) via saddle-point viewpoints. The (single-agent) TD-learning is a reinforcement learning (RL) algorithm for evaluating a given policy based on reward feedbacks. In multi-agent settings, multiple RL agents concurrently behave, and each agent receives its local rewards. The goal of each agent is to evaluate a given policy corresponding to the global reward, which is an average of the local rewards by sharing learning parameters through random network communications. In this paper, we propose a distributed TD-learning based on saddle-point frameworks, and provide rigorous analysis of finite-time convergence of the algorithm and its solution based on tools in optimization theory. The results in this paper provide general and unified perspectives of the distributed policy evaluation problem, and theoretically complement the previous works.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords