IEEE Access (Jan 2024)
Enhancing Deep Compression of CNNs: A Novel Regularization Loss and the Impact of Distance Metrics
Abstract
Transfer learning models tackle two critical problems in deep learning. First, for small datasets, it reduces the problem of overfitting. Second, for large datasets, it reduces the computational cost as fewer iterations are required to train the model. Standard transfer learning models such as VGGNet, ResNet, and GoogLeNet require significant memory and computational power, limiting their use on devices with limited resources. The research paper contributes to overcoming this problem by compressing the transfer learning model using channel pruning. In current times, computational cost is more significant compared to memory cost. The convolution layer with fewer parameters contributes more to computational cost. Thus, we focus on pruning the convolution layer to reduce computational cost. Total loss is a combination of prediction loss and regularization loss. Regularization loss is the sum of the magnitudes of parameter values. The training process aims to reduce total loss. In order to reduce total loss, the regularization loss also needs to be reduced. Therefore, training not only minimizes prediction error but also manages the magnitude of the model’s weights. Important weights are maintained at higher values to keep the prediction loss low, while unimportant weight values can be reduced to decrease regularization loss. Thus regularization adjusts the magnitudes of parameters at varying rates, depending on their importance. Quantitative pruning methods select parameters based on their magnitude, which improves the effectiveness of the pruning process. Standard $L_{1}$ and $L_{2}$ regularization focus on individual parameters, aiding in unstructured pruning. However, group regularization is required for structured pruning. To address this, we introduce a novel group regularization loss designed specifically for structured channel pruning. This new regularization loss optimizes the pruning process by focusing on entire groups of parameters belonging to the channel rather than just individual ones. This method ensures that structured pruning is more efficient and targeted. Custom Standard Deviation (CSD) is calculated by summing the absolute differences between each parameter value and the mean value. To evaluate the parameters of a given channel, both the $L_{1}$ norm and CSD are computed. The novel regularization loss for a channel in the convolutional layer is defined as the ratio of $L_{1}$ norm to CSD ( $L_{1}Norm/CSD$ ). This approach groups the regularization loss for all parameters within a channel, making the pruning process more structured and efficient. Custom regularization loss further improves pruning efficiency, enabling a 46.14% reduction in parameters and a 61.91% decrease in FLOPs. This paper also employs the K-Means algorithm for similarity-based pruning and evaluates three distance metrics: Manhattan, Euclidean, and Cosine. Results indicate that pruning by K-Means algorithms using Manhattan distance leads to a 35.15% reduction in parameters and a 49.11% decrease in FLOPs, outperforming Euclidean and Cosine distances using the same algorithm.
Keywords