HLQ: Hardware-Friendly Logarithmic Quantization Aware Training for Power-Efficient Low-Precision CNN Models

Dahun Choi; Juntae Park; Hyun Kim

doi:10.1109/ACCESS.2024.3488093

IEEE Access (Jan 2024)

HLQ: Hardware-Friendly Logarithmic Quantization Aware Training for Power-Efficient Low-Precision CNN Models

Dahun Choi,
Juntae Park,
Hyun Kim

Affiliations

Dahun Choi: Department of Electrical and Information Engineering, Research Center for Electrical and Information Technology, Seoul National University of Science and Technology, Seoul, South Korea
Juntae Park: Department of Electrical and Information Engineering, Research Center for Electrical and Information Technology, Seoul National University of Science and Technology, Seoul, South Korea
Hyun Kim: ORCiD; Department of Electrical and Information Engineering, Research Center for Electrical and Information Technology, Seoul National University of Science and Technology, Seoul, South Korea

DOI: https://doi.org/10.1109/ACCESS.2024.3488093
Journal volume & issue: Vol. 12
pp. 159611 – 159621

Abstract

Read online

With the development of deep learning and graphics processing units (GPUs), various convolutional neural network (CNN)-based computer vision studies have been conducted. Because numerous computations are involved in the inference and training process of CNNs, research on network compression, including quantization, is being actively conducted along with the use of CNNs. Unlike the existing linear quantization, logarithmic quantization has the advantage that the multiply-accumulate (MAC) operation in the convolution (CONV) operation, which occupies most of the CNNs, can be replaced with the addition operation and is suitable for low-precision quantization. In this paper, we propose a logarithmic quantization aware training technique that effectively reduces quantization loss while maximizing the effect of reducing hardware resources and power consumption in the forward and backward propagation processes of the CNN. The proposed method minimizes the accuracy drop by allocating the rounding point with the least quantization loss for each specific training in the forward pass and propagates the optimized gradient by scaling the gradient of parameters with a high quantization loss in the backward pass. As a result of scratch training on the Tiny-ImageNet dataset using ResNet-18, 34, and 50, where both weights and activations are quantized to 4-bits through the proposed method, an improvement in accuracy of 0.88%, 0.48%, and 1.72%, respectively, can be achieved compared to that of the baseline (i.e., full-precision). In addition, as a result of synthesizing the CONV acceleration unit of ResNet-18 through RTL implementation, the proposed 4-bit quantization can achieve a power saving of 82.3% compared to the baseline (i.e., full-precision) when computing ResNet-18.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords