Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence

Robert A. Cohen; Hyomin Choi; Ivan V. Bajic

doi:10.1109/OJCAS.2021.3072884

IEEE Open Journal of Circuits and Systems (Jan 2021)

Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence

Robert A. Cohen,
Hyomin Choi,
Ivan V. Bajic

Affiliations

Robert A. Cohen: ORCiD; School of Engineering Science, Simon Fraser University, Burnaby, Canada
Hyomin Choi: School of Engineering Science, Simon Fraser University, Burnaby, Canada
Ivan V. Bajic: ORCiD; School of Engineering Science, Simon Fraser University, Burnaby, Canada

DOI: https://doi.org/10.1109/OJCAS.2021.3072884
Journal volume & issue: Vol. 2
pp. 350 – 362

Abstract

Read online

In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device, and the remaining portion of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features output by the intermediate layer of a split DNN, without requiring any retraining of the network weights. Mathematical models for estimating the clipping and quantization error of leaky-ReLU and ReLU activations at this intermediate layer are used to compute optimal clipping ranges for coarse quantization. A mathematical model for estimating the clipping and quantization error of leaky-ReLU activations at this intermediate layer is developed and used to compute optimal clipping ranges for coarse quantization. We also present a modified entropy-constrained design algorithm for quantizing clipped activations. When applied to popular object-detection and classification DNNs, we were able to compress the 32-bit floating point intermediate activations down to 0.6 to 0.8 bits, while keeping the loss in accuracy to less than 1%. When compared to HEVC, we found that the lightweight codec consistently provided better inference accuracy, by up to 1.3%. The performance and simplicity of this lightweight compression technique makes it an attractive option for coding an intermediate layer of a split neural network for edge/cloud applications.

Published in IEEE Open Journal of Circuits and Systems

ISSN: 2644-1225 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electric apparatus and materials. Electric circuits. Electric networks
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=8784029

About the journal

Abstract

Keywords