Custom Network Quantization Method for Lightweight CNN Acceleration on FPGAs

Lingjie Yi; Xianzhong Xie; Yi Wan; Bo Jiang; Junfan Chen

doi:10.1155/2024/8018810

International Journal of Distributed Sensor Networks (Jan 2024)

Custom Network Quantization Method for Lightweight CNN Acceleration on FPGAs

Lingjie Yi,
Xianzhong Xie,
Yi Wan,
Bo Jiang,
Junfan Chen

Affiliations

Lingjie Yi: School of Computer Science and Technology
Xianzhong Xie: School of Computer Science and Technology
Yi Wan: School of Computer Science and Technology
Bo Jiang: School of Computer Science and Technology
Junfan Chen: Chongqing Haiyun Jiexun Technology

DOI: https://doi.org/10.1155/2024/8018810
Journal volume & issue: Vol. 2024

Abstract

Read online

The low-bit quantization can effectively reduce the deep neural network storage as well as the computation costs. Existing quantization methods have yielded unsatisfactory results when being applied to lightweight networks. Additionally, following network quantization, the differences in data types between the operators can cause issues when deploying networks on Field Programmable Gate Arrays (FPGAs). Moreover, some operators cannot be accelerated heterogeneously on FPGAs, resulting in frequent switching between the Advanced RISC Machine (ARM) and FPGA environments for computation tasks. To address these problems, this paper proposes a custom network quantization approach. Firstly, an improved PArameterized Clipping Activation (PACT) method is employed during the quantization aware training to restrict the value range of neural network parameters and reduce the loss of precision arising from quantization. Secondly, the Consecutive Execution Of Convolution Operators (CEOCO) strategy is utilized to mitigate the resource consumption caused by the frequent environment switching. The proposed approach is validated on Xilinx Zynq Ultrascale+MPSoC 3EG and Virtex UltraScale+XCVU13P platforms. The MobileNetv1, MobileNetv3, PPLCNet, and PPLCNetv2 networks were utilized as testbeds for the validation. Moreover, experimental results are on the miniImageNet, CIFAR-10, and OxFord 102 Flowers public datasets. In comparison to the original model, the proposed optimization methods result in an average decrease of 1.2% in accuracy. Compared to conventional quantization method, the accuracy remains almost unchanged, while the frames per second (FPS) on FPGAs improves by an average of 2.1 times.

Published in International Journal of Distributed Sensor Networks

ISSN: 1550-1329 (Print); 1550-1477 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://onlinelibrary.wiley.com/journal/dsn

About the journal