IEEE Access (Jan 2019)

An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm

  • Sergio Spano,
  • Gian Carlo Cardarilli,
  • Luca Di Nunzio,
  • Rocco Fazzolari,
  • Daniele Giardino,
  • Marco Matta,
  • Alberto Nannarelli,
  • Marco Re

DOI
https://doi.org/10.1109/ACCESS.2019.2961174
Journal volume & issue
Vol. 7
pp. 186340 – 186351

Abstract

Read online

In this paper we propose an efficient hardware architecture that implements the Q-Learning algorithm, suitable for real-time applications. Its main features are low-power, high throughput and limited hardware resources. We also propose a technique based on approximated multipliers to reduce the hardware complexity of the algorithm. We implemented the design on a Xilinx Zynq Ultrascale+ MPSoC ZCU106 Evaluation Kit. The implementation results are evaluated in terms of hardware resources, throughput and power consumption. The architecture is compared to the state of the art of Q-Learning hardware accelerators presented in the literature obtaining better results in speed, power and hardware resources. Experiments using different sizes for the Q-Matrix and different wordlengths for the fixed point arithmetic are presented. With a Q-Matrix of size 8 × 4 (8 bit data) we achieved a throughput of 222 MSPS (Mega Samples Per Second) and a dynamic power consumption of 37 mW, while with a Q-Matrix of size 256 × 16 (32 bit data) we achieved a throughput of 93 MSPS and a power consumption 611 mW. Due to the small amount of hardware resources required by the accelerator, our system is suitable for multi-agent IoT applications. Moreover, the architecture can be used to implement the SARSA (State-Action-Reward-StateAction) Reinforcement Learning algorithm with minor modifications.

Keywords