AlphaGo Policy Network: A DCNN Accelerator on FPGA

Zhenni Li; Can Zhu; Yu-Liang Gao; Ze-Kun Wang; Jiao Wang

doi:10.1109/ACCESS.2020.3023739

IEEE Access (Jan 2020)

AlphaGo Policy Network: A DCNN Accelerator on FPGA

Zhenni Li,
Can Zhu,
Yu-Liang Gao,
Ze-Kun Wang,
Jiao Wang

Affiliations

Zhenni Li: ORCiD; College of Information Science and Engineering, Northeastern University, Shenyang, China
Can Zhu: ORCiD; College of Information Science and Engineering, Northeastern University, Shenyang, China
Yu-Liang Gao: ORCiD; College of Information Science and Engineering, Northeastern University, Shenyang, China
Ze-Kun Wang: ORCiD; College of Information Science and Engineering, Northeastern University, Shenyang, China
Jiao Wang: ORCiD; College of Information Science and Engineering, Northeastern University, Shenyang, China

DOI: https://doi.org/10.1109/ACCESS.2020.3023739
Journal volume & issue: Vol. 8
pp. 203039 – 203047

Abstract

Read online

The game of GO has long been regarded as the most challenging game for artificial intelligence because of its enormous search space and the difficulty of evaluating its board positions. In early 2016, the defeat of Lee Sedol by AlphaGo became the milestone of artificial intelligence. AlphaGo's success lies in that it efficiently combines policy and value networks with Monte Carlo tree search (MCTS). And these deep convolutional neural networks (DCNNs) are trained by the combination of supervised learning and reinforcement learning. However, large convolution operations are computationally-intensive and typically require a powerful computing platform, for example, a graphics processing unit (GPU). Therefore, it is challenging to apply DCCNs in resource-limited embedded systems. Field programmable gate array (FPGA) is proposed to be an appropriate solution to implement real-time DCCNs models. However, the limited bandwidth and on-chip memory storage are the bottlenecks for DCCNs acceleration. In this article, an AlphaGo Policy Network is designed, and efficient hardware architectures are proposed to accelerate the DCCN model. The accelerator can be fit into different FPGAs, providing the balancing between processing speed and hardware resources. As an example, the AlphaGo Policy Network is implemented on Xilinx design suite VCU118, and the results show that our implementation achieved a performance of 3036.32 GOPS and achieved up to 56x speedup compared to CPU and 22.4x speedup compared to GPU.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords