SANA: Sensitivity-Aware Neural Architecture Adaptation for Uniform Quantization

Mingfei Guo; Zhen Dong; Kurt Keutzer

doi:10.3390/app131810329

Applied Sciences (Sep 2023)

SANA: Sensitivity-Aware Neural Architecture Adaptation for Uniform Quantization

Mingfei Guo,
Zhen Dong,
Kurt Keutzer

Affiliations

Mingfei Guo: Electrical Engineering, Stanford University, Stanford, CA 94305, USA
Zhen Dong: Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, USA
Kurt Keutzer: Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, USA

DOI: https://doi.org/10.3390/app131810329
Journal volume & issue: Vol. 13, no. 18
p. 10329

Abstract

Read online

Uniform quantization is widely taken as an efficient compression method in practical applications. Despite its merit of having a low computational overhead, uniform quantization fails to preserve sensitive components in neural networks when applied with ultra-low bit precision, which could lead to a non-trivial accuracy degradation. Previous works have applied mixed-precision quantization to address this problem. However, finding the correct bit settings for different layers always demands significant time and resource consumption. Moreover, mixed-precision quantization is not well supported on current general-purpose machines such as GPUs and CPUs and, thus, will cause intolerable overheads in deployment. To leverage the efficiency of uniform quantization while maintaining accuracy, in this paper, we propose sensitivity-aware network adaptation (SANA), which automatically modifies the model architecture based on sensitivity analysis to make it more compatible with uniform quantization. Furthermore, we formulated four different channel initialization strategies to accelerate the quantization-aware fine-tuning process of SANA. Our experimental results showed that SANA can outperform standard uniform quantization and other state-of-the-art quantization methods in terms of accuracy, with comparable or even smaller memory consumption. Notably, ResNet-50-SANA (24.4 MB) with W4A8 quantization achieved 77.8% top-one accuracy on ImageNet, which even surpassed the 77.6% of the full-precision ResNet-50 (97.8 MB) baseline.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords