Ultimate Compression: Joint Method of Quantization and Tensor Decomposition for Compact Models on the Edge

Mohammed Alnemari; Nader Bagherzadeh

doi:10.3390/app14209354

Applied Sciences (Oct 2024)

Ultimate Compression: Joint Method of Quantization and Tensor Decomposition for Compact Models on the Edge

Mohammed Alnemari,
Nader Bagherzadeh

Affiliations

Mohammed Alnemari: Department of Electrical Engineering and Computer Science, University of California, Irvine, CA 92697, USA
Nader Bagherzadeh: Department of Electrical Engineering and Computer Science, University of California, Irvine, CA 92697, USA

DOI: https://doi.org/10.3390/app14209354
Journal volume & issue: Vol. 14, no. 20
p. 9354

Abstract

Read online

This paper proposes the “ultimate compression” method as a solution to the expansive computation and high storage costs required by state-of-the-art neural network models in inference. Our approach uniquely combines tensor decomposition techniques with binary neural networks to create efficient deep neural network models optimized for edge inference. The process includes training floating-point models, applying tensor decomposition algorithms, binarizing the decomposed layers, and fine tuning the resulting models. We evaluated our approach in various state-of-the-art deep neural network architectures on multiple datasets, such as MNIST, CIFAR-10, CIFAR-100, and ImageNet. Our results demonstrate compression ratios of up to 169×, with only a small degradation in accuracy (1–2%) compared to binary models. We employed different optimizers for training and fine tuning, including Adam and AdamW, and used norm grad clipping to address the exploding gradient problem in decomposed binary models. A key contribution of this work is a novel layer sensitivity-based rank selection algorithm for tensor decomposition, which outperforms existing methods such as random selection and Variational Bayes Matrix Factorization (VBMF). We conducted comprehensive experiments using six different models and present a case study on crowd-counting applications, demonstrating the practical applicability of our method. The ultimate compression method outperforms binary neural networks and tensor decomposition when applied individually in terms of storage and computation costs. This positions it as one of the most effective options for deploying compact and efficient models in edge devices with limited computational resources and energy constraints.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords