IEEE Access (Jan 2021)
Compressing Neural Networks With Inter Prediction and Linear Transformation
Abstract
Because of resource-constrained environments, network compression has become an essential part of deep neural networks research. In this paper, we found a mutual relationship between kernel weights termed as Inter-Layer Kernel Correlation (ILKC). The kernel weights between two different convolution layers share a substantial similarity in shapes and values. Based on this relationship, we propose a new compression method, Inter-Layer Kernel Prediction (ILKP), which represents convolutional kernels with fewer bits through similarity between kernel weights in convolutional neural networks. Furthermore, to effectively adapt the inter prediction scheme from video coding technology, we integrate a linear transformation into the prediction scheme, which significantly enhances compression efficiency. The proposed method achieved 93.77% top-1 accuracy with $4.1\times $ compression ratio compared to the ResNet110 baseline model on CIFAR10. It means that 0.04% top-1 accuracy improvement was achieved by using less memory footprint. Moreover, incorporating quantization, the proposed method achieved a $13\times $ compression ratio with little performance degradation compared to the ResNets baseline model trained on CIFAR10 and CIFAR100.
Keywords