IEEE Access (Jan 2023)

Integrated Imager and 3.22 <italic>&#x03BC;</italic>s/Kernel-Latency All-Digital In-Imager Global-Parallel Binary Convolutional Neural Network Accelerator for Image Processing

  • Ruizhi Wang,
  • Cheng-Hsuan Wu,
  • Makoto Takamiya

DOI
https://doi.org/10.1109/ACCESS.2023.3296429
Journal volume & issue
Vol. 11
pp. 74364 – 74378

Abstract

Read online

This paper presents an innovative approach to achieve ultralow-latency convolutional neural network (CNN) processing, which is critical for real-time image processing applications such as autonomous driving and virtual reality. Traditional CNN accelerators employing in/near-array-computing (inclusive of in/near-memory-computing and in/near-sensor-computing) architectures have struggled to meet real-time requirements due to latency bottlenecks encountered with conventional column-parallel processing for image processing. To address this challenge, we propose a novel, all-digital in- imager global-parallel binary convolutional neural network (IIGP-BNN) accelerator. This new approach employs a global-parallel processing concept, which enables multiply-and-accumulate operations (MACs) to be executed simultaneously within the imager array in a 2D manner, eliminating the additional latency associated with row-by-row processing and data access from random access memories (RAMs). In this design, convolution and subsampling operations using a $3\times $ 3 kernel are completed within just nine steps of global-parallel processing, regardless of image size. This results in a theoretical reduction of over 88.5% of repeated row scans compared to conventional column-parallel processing architectures, thus significantly reducing computing latency. We have designed and prototyped a $30\times30$ integrated imager and IIGP-BNN accelerator IC using a $0.18~\mu \text{m}$ CMOS process. This prototype achieved a latency of $3.22~\mu \text{s}$ /kernel on the first layer convolution at a power supply of 1 V and a clock frequency of 35.7 MHz. This represents a latency reduction of 35.6% compared to the state-of-the-art in/near-imager-computing works. This proposed global-parallel processing concept opens up the potential for processing high-resolution images in 4K and 8K with the same ultralow latency, marking a significant advancement in high-speed image processing.

Keywords