Efficient and Compact Representations of Deep Neural Networks via Entropy Coding

Giosue Cataldo Marino; Flavio Furia; Dario Malchiodi; Marco Frasca

doi:10.1109/ACCESS.2023.3317293

IEEE Access (Jan 2023)

Efficient and Compact Representations of Deep Neural Networks via Entropy Coding

Giosue Cataldo Marino,
Flavio Furia,
Dario Malchiodi,
Marco Frasca

Affiliations

Giosue Cataldo Marino: Department of Computer Science, University of Milan, Milan, Italy
Flavio Furia: Department of Computer Science, University of Milan, Milan, Italy
Dario Malchiodi: Department of Computer Science, University of Milan, Milan, Italy
Marco Frasca: ORCiD; Department of Computer Science, University of Milan, Milan, Italy

DOI: https://doi.org/10.1109/ACCESS.2023.3317293
Journal volume & issue: Vol. 11
pp. 106103 – 106125

Abstract

Read online

Matrix operations are nowadays central in many Machine Learning techniques, including in particular Deep Neural Networks (DNNs), whose core of any inference is represented by a sequence of dot product operations. An increasingly emerging problem is how to efficiently engineer their storage and operations. In this article we propose two new lossless compression schemes for real-valued matrices, supporting efficient vector-matrix multiplications in the compressed format, and specifically suitable for DNNs compression. Exploiting several recent studies that use weight pruning and quantization techniques to reduce the complexity of DNN inference, our schemes are expressly designed to benefit from both, that is from input matrices characterized by low entropy. In particular, our solutions are able to take advantage from the depth of the model, and the deeper the model, the higher the efficiency. Moreover, we derived space upper bounds for both variants in terms of the source entropy. Experiments show that our tools favourably compare in terms of energy and space efficiency against state-of-the-art matrix compression approaches, including Compressed Linear Algebra (CLA) and Compressed Shared Elements Row (CSER), the latter explicitly proposed in the context of DNN compression.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords