ANS: Assimilating Near Similarity at High Accuracy for Significant Deduction of CNN Storage and Computation

Wang Wang; Xin Zhong; Manni Li; Zixu Li; Yinyin Lin

doi:10.1109/ACCESS.2023.3256540

IEEE Access (Jan 2023)

ANS: Assimilating Near Similarity at High Accuracy for Significant Deduction of CNN Storage and Computation

Wang Wang,
Xin Zhong,
Manni Li,
Zixu Li,
Yinyin Lin

Affiliations

Wang Wang: ORCiD; State Key Laboratory of ASIC and System, School of Microelectronics, Fudan University, Shanghai, China
Xin Zhong: State Key Laboratory of ASIC and System, School of Microelectronics, Fudan University, Shanghai, China
Manni Li: State Key Laboratory of ASIC and System, School of Microelectronics, Fudan University, Shanghai, China
Zixu Li: State Key Laboratory of ASIC and System, School of Microelectronics, Fudan University, Shanghai, China
Yinyin Lin: ORCiD; State Key Laboratory of ASIC and System, School of Microelectronics, Fudan University, Shanghai, China

DOI: https://doi.org/10.1109/ACCESS.2023.3256540
Journal volume & issue: Vol. 11
pp. 25415 – 25430

Abstract

Read online

Activation data size has been roaring with the development of convolutional neural networks, which accounts for the boosting storage requirements. Our insight indicates that non-zero values dominate activations, of which the patterns demonstrate near similarity. We propose ANS method to compress activations in real time during both training and inference. High compression ratio with less accuracy loss is achieved by our optimization strategies, including determination of selection box (SB) size according to the amount of zero values of layer, learning and calibrating threshold dynamically, using the mean value of similar SB as compression value. Over 49% of compression ratio is achieved with accuracy loss of less than 0.892%, as well as reduction of multiplications by more than 60%. Comparing to three state-of-art compressed methods under five mainstream CNN models, ANS provides compression ratio improvement of 3.2x over RLC5, 1.9x over GRLC and 1.7x over ZVC. The ANS compressor and decompressor are implemented in Verilog and synthesized in 28nm node, which indicates that ANS has less cost of performance and hardware overburden. ANS modules could be seamlessly attached at the interface or deeply coupled into DNN accelerator with changed data path in the MAC array, which achieve 38% and 56% reduction in energy consumption, respectively.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords