IEEE Access (Jan 2024)

Multiplication-Free Lookup-Based CNN Accelerator Using Residual Vector Quantization and Its FPGA Implementation

  • Hiroshi Fuketa,
  • Toshihiro Katashita,
  • Yohei Hori,
  • Masakazu Hioki

DOI
https://doi.org/10.1109/ACCESS.2024.3432979
Journal volume & issue
Vol. 12
pp. 102470 – 102480

Abstract

Read online

In this paper, a table lookup-based computing technique is proposed to perform convolutional neural network (CNN) inference without multiplication, and its FPGA implementation is demonstrated as a proof-of-concept. Conventionally, the hardware specific to the lookup-based dot product approximation (LDA) has been proposed to achieve energy-efficient CNN computations for edge AI applications. However, it has not been applied to complicated AI tasks, such as ImageNet image classification, because LDA degrades the inference accuracy, especially in complicated tasks. Therefore, a new LDA technique using residual vector quantization, called RLDA, is proposed in this study. By adopting the proposed RLDA to CNN, we can achieve an inference accuracy degradation of below 5% for ResNet-18 model on ImageNet 1000 classification. In addition, the proposed RLDA-based CNN accelerator is implemented on a ZCU104 evaluation board, which includes a Zynq UltraScale+ FPGA and DDR4 DRAM. We compare the processing time of the proposed accelerator on ZCU104 with that of NVIDIA Jetson AGX Orin and reveal that the processing time of the proposed accelerator is comparable to that of Jetson Orin in the lower layers of the ResNet-18 model. Finally, the computational performance of the proposed accelerator is compared with conventional FPGA-based accelerators. The proposed accelerator does not require a digital signal processor (DSP), which is often available in modern FPGAs. We demonstrate that the proposed accelerator achieves a performance more than four times higher than that of the conventional FPGA-based accelerator without DSP.

Keywords