IEEE Access (Jan 2023)
Feature Map Compression for Video Coding for Machines Based on Receptive Block Based Principal Component Analysis
Abstract
This paper presents a method to effectively compress the intermediate layer feature map of a convolutional neural network for the potential structures of Video Coding for Machines, which is an emerging technology for future machine consumption applications. Notably, most extant studies compress a single feature map and hence cannot entirely consider both global and local information within the feature map. This limits performance maintenance during machine consumption tasks that analyze objects with various sizes in images/videos. To address this problem, a multiscale feature map compression method is proposed that consists of two major processes: receptive block based principal component analysis (RPCA) and uniform integer quantization. The RPCA derives the complete basis kernels of a feature map by selecting a set of major basis kernels that can represent a sufficient percentage of global or local information according to the variable-size receptive blocks of each feature map. After transforming each feature map using the set of major basis kernels, a uniform integer quantizer converts the 32-bit floating-point values of the set of major basis kernels, corresponding RPCA coefficients, and a mean vector to five-bit integer representation values. Experiment results reveal that the proposed method reduces the amount of feature maps by 99.30% with a loss of 8.30% in the average precision (AP) on the OpenImageV6 dataset and 0.77% in $AP_{M}$ and 0.47% in $AP_{L}$ on the MS COCO 2017 validation set while outperforming previous PCA-based feature map compression methods even at higher compression rates.
Keywords