IEEE Access (Jan 2020)
AlphaGo Policy Network: A DCNN Accelerator on FPGA
Abstract
The game of GO has long been regarded as the most challenging game for artificial intelligence because of its enormous search space and the difficulty of evaluating its board positions. In early 2016, the defeat of Lee Sedol by AlphaGo became the milestone of artificial intelligence. AlphaGo's success lies in that it efficiently combines policy and value networks with Monte Carlo tree search (MCTS). And these deep convolutional neural networks (DCNNs) are trained by the combination of supervised learning and reinforcement learning. However, large convolution operations are computationally-intensive and typically require a powerful computing platform, for example, a graphics processing unit (GPU). Therefore, it is challenging to apply DCCNs in resource-limited embedded systems. Field programmable gate array (FPGA) is proposed to be an appropriate solution to implement real-time DCCNs models. However, the limited bandwidth and on-chip memory storage are the bottlenecks for DCCNs acceleration. In this article, an AlphaGo Policy Network is designed, and efficient hardware architectures are proposed to accelerate the DCCN model. The accelerator can be fit into different FPGAs, providing the balancing between processing speed and hardware resources. As an example, the AlphaGo Policy Network is implemented on Xilinx design suite VCU118, and the results show that our implementation achieved a performance of 3036.32 GOPS and achieved up to 56x speedup compared to CPU and 22.4x speedup compared to GPU.
Keywords