IEEE Access (Jan 2024)
Heuristic Compression Method for CNN Model Applying Quantization to a Combination of Structured and Unstructured Pruning Techniques
Abstract
Model Compression is an actively pursued research field in recent years with the goal of deploying state-of-the-art deep neural networks. It is targeted to implementations which are based on power constrained and resource limited devices as the reduced model achieves without significant accuracy loss, but with effective resource size reduction. The network pruning and the weight quantization techniques are well-known model compression methods. Our previous work successfully demonstrated significant reductions regarding the network model size by applying a managed combination of the structured and unstructured pruning methods. In order to achieve further reduction of the model, this paper introduces new heuristic methods that employ a weight quantization technique with both structured and unstructured pruning methods as those keep a given target accuracy. We experimentally demonstrate the performance evaluations of the proposed method by applying it to the actual state-of-the-art CNN models of VGGNet, ResNet and DenseNet under well-known CIFAR-10 dataset. In the best case during our experimental outcomes, the proposed method achieves the reduction of 28 times less model size and 76 times less compression processing time compared to the brute-force search method.
Keywords