Xibei Gongye Daxue Xuebao (Apr 2022)
A reconfigurable processor for mix-precision CNNs on FPGA
Abstract
To solve the problem of low computing efficiency of existing accelerators for convolutional neural network (CNNs), which caused by the inability to adapt to the characteristics of computing mode and caching for the mixed-precision quantized CNNs model, we propose a reconfigurable CNN processor in this paper, which consists of the reconfigurable adaptable computing unit, flexible on-chip cache unit and macro-instruction set. The multi-core CNN processor can be reconstructed according to the structure of CNN models and constraints of reconfigurable resources, to improve the utilization of computing resources. The elastic on-chip buffer and the data access approach by dynamically configuring an address to better utilization of on-chip memory. Then, the macroinstruction set architecture (mISA) can fully express the characteristics of the mixed-precision CNN models and reconfigurable processors, to reduce the complexity of mapping CNNs with different network structures and computing modes to reconfigurable the CNNs processors. For the well-known CNNs-VGG16 and ResNet-50, the proposed CNN processor has been implemented using Ultra96-V2 and ZCU102 FPGA, showing the throughput of 216.6 GOPS, and 214 GOPS, the computing efficiency of 0.63 GOPS/DSP and 0.64 GOPS/DSP on Ultra96-V2, respectively, achieving a better efficiency than the CNN accelerator based on fixed bit-width. Meanwhile, for ResNet-50, the throughput and the computing efficiency are up to 931.8 GOPS, 0.40 GOPS/DSP on ZCU102, respectively. In addition, these achieve up to 55.4% higher throughput than state-of-the-art CNN accelerators.
Keywords