An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm

Sergio Spano; Gian Carlo Cardarilli; Luca Di Nunzio; Rocco Fazzolari; Daniele Giardino; Marco Matta; Alberto Nannarelli; Marco Re

doi:10.1109/ACCESS.2019.2961174

IEEE Access (Jan 2019)

An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm

Sergio Spano,
Gian Carlo Cardarilli,
Luca Di Nunzio,
Rocco Fazzolari,
Daniele Giardino,
Marco Matta,
Alberto Nannarelli,
Marco Re

Affiliations

Sergio Spano: ORCiD; Department of Electronic Engineering, University of Rome “Tor Vergata,”, Rome, Italy
Gian Carlo Cardarilli: Department of Electronic Engineering, University of Rome “Tor Vergata,”, Rome, Italy
Luca Di Nunzio: Department of Electronic Engineering, University of Rome “Tor Vergata,”, Rome, Italy
Rocco Fazzolari: Department of Electronic Engineering, University of Rome “Tor Vergata,”, Rome, Italy
Daniele Giardino: Department of Electronic Engineering, University of Rome “Tor Vergata,”, Rome, Italy
Marco Matta: ORCiD; Department of Electronic Engineering, University of Rome “Tor Vergata,”, Rome, Italy
Alberto Nannarelli: Department of Applied Mathematics and Computer Science, Danmarks Tekniske Universitet, 2800, Kgs. Lyngby, Denmark
Marco Re: Department of Electronic Engineering, University of Rome “Tor Vergata,”, Rome, Italy

DOI: https://doi.org/10.1109/ACCESS.2019.2961174
Journal volume & issue: Vol. 7
pp. 186340 – 186351

Abstract

Read online

In this paper we propose an efficient hardware architecture that implements the Q-Learning algorithm, suitable for real-time applications. Its main features are low-power, high throughput and limited hardware resources. We also propose a technique based on approximated multipliers to reduce the hardware complexity of the algorithm. We implemented the design on a Xilinx Zynq Ultrascale+ MPSoC ZCU106 Evaluation Kit. The implementation results are evaluated in terms of hardware resources, throughput and power consumption. The architecture is compared to the state of the art of Q-Learning hardware accelerators presented in the literature obtaining better results in speed, power and hardware resources. Experiments using different sizes for the Q-Matrix and different wordlengths for the fixed point arithmetic are presented. With a Q-Matrix of size 8 × 4 (8 bit data) we achieved a throughput of 222 MSPS (Mega Samples Per Second) and a dynamic power consumption of 37 mW, while with a Q-Matrix of size 256 × 16 (32 bit data) we achieved a throughput of 93 MSPS and a power consumption 611 mW. Due to the small amount of hardware resources required by the accelerator, our system is suitable for multi-agent IoT applications. Moreover, the architecture can be used to implement the SARSA (State-Action-Reward-StateAction) Reinforcement Learning algorithm with minor modifications.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords