Federated Reinforcement Learning Acceleration Method for Precise Control of Multiple Devices

Hyun-Kyo Lim; Ju-Bong Kim; Ihsan Ullah; Joo-Seong Heo; Youn-Hee Han

doi:10.1109/ACCESS.2021.3083087

IEEE Access (Jan 2021)

Federated Reinforcement Learning Acceleration Method for Precise Control of Multiple Devices

Hyun-Kyo Lim,
Ju-Bong Kim,
Ihsan Ullah,
Joo-Seong Heo,
Youn-Hee Han

Affiliations

Hyun-Kyo Lim: ORCiD; Department of Interdisciplinary Program in Creative Engineering, Korea University of Technology and Education, Cheonan, South Korea
Ju-Bong Kim: ORCiD; Department of Computer Science Engineering, Korea University of Technology and Education, Cheonan, South Korea
Ihsan Ullah: ORCiD; Advanced Technology Research Center, Korea University of Technology and Education, Cheonan, South Korea
Joo-Seong Heo: Department of Interdisciplinary Program in Creative Engineering, Korea University of Technology and Education, Cheonan, South Korea
Youn-Hee Han: ORCiD; Department of Computer Science Engineering, Korea University of Technology and Education, Cheonan, South Korea

DOI: https://doi.org/10.1109/ACCESS.2021.3083087
Journal volume & issue: Vol. 9
pp. 76296 – 76306

Abstract

Read online

Nowadays, Reinforcement Learning (RL) is applied to various real-world tasks and attracts much attention in the fields of games, robotics, and autonomous driving. It is very challenging and devices overwhelming to directly apply RL to real-world environments. Due to the reality gap simulated environment does not match perfectly to the real-world scenario and additional learning cannot be performed. Therefore, an efficient approach is required for RL to find an optimal control policy and get better learning efficacy. In this paper, we propose federated reinforcement learning based on multi agent environment which applying a new federation policy. The new federation policy allows multi agents to perform learning and share their learning experiences with each other e.g., gradient and model parameters to increase their learning level. The Actor-Critic PPO algorithm is used with four types of RL simulation environments, OpenAI Gym’s CartPole, MoutainCar, Acrobot, and Pendulum. In addition, we did real experiments with multiple Rotary Inverted Pendulum (RIP) to evaluate and compare the learning efficiency of the proposed scheme with both environments.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords