IEEE Open Journal of Circuits and Systems (Jan 2021)
ConvAix: An Application-Specific Instruction-Set Processor for the Efficient Acceleration of CNNs
Abstract
ConvAix is an application-specific instruction-set processor (ASIP) that enables the energy-efficient processing of convolutional neural networks (CNNs) while retaining substantial flexibility through its instruction-set architecture (ISA) based design. By utilizing a combination of data-level parallelism (DLP), instruction-level parallelism (ILP), and subword parallelism, the proposed design offers sufficient processing power for the execution of state-of-the-art CNNs in real-time. ConvAix's arithmetic logic units (ALUs) are C-programmable, thereby offering the degree of flexibility required to implement many different convolution layer types, e.g., depthwise-separable convolutions and residual blocks, as well as fully-connected and pooling layers. It comprises a total of 256 ALUs and leverages low-precision computations down to 4 bits. Furthermore, it exploits sparsity in feature maps and weights via zero-guarding of redundant computations to maximize its energy efficiency. The processor was implemented in a modern 28 nm CMOS technology operating at 1V supply voltage with a resulting clock frequency of 513 MHz. The final design offers a precision-dependent peak throughput between 263 GOP/s (int16) and 1.1 TOP/s (int4), while consuming between 972mW and 340mW of power, resulting in effective energy-efficiencies ranging from 176 GOP/s/W to 2 TOP/s/W. Well-known CNNs, such as AlexNet, MobileNet, and ResNet-18, are simulated based on the placed and routed netlist, achieving between 233 (AlexNet) and 69 (ResNet-18) frames-per-second for a batch-size of 1, including times for off-chip transfers.
Keywords