IEEE Access (Jan 2024)
Semantics-Guided and Saliency-Focused Learning of Perceptual Video Compression
Abstract
In recent years, video compression has emerged as a focal point of considerable interest. Nevertheless, the predominant focus of existing methods lies in the meticulous reconstruction of videos with high fidelity, often at the expense of prioritizing the perceptual visual comfort experienced by human viewers. This paper presents an innovative learnable perceptual video compression method that extends the capabilities of current codecs. It enhances their perceptual coding proficiency by delving into the significance of local semantics and foreground objects in the context of human vision. Incorporating local semantics into the coding system involves the utilization of a region-wise contrastive learning objective, compelling the encoder to extract information pertinent to semantics. To safeguard foreground objects from corruption during compression, we prioritize minimal distortion in the foreground regions. This is achieved by employing an off-the-shelf visual saliency model for the precise detection of these regions. In an effort to augment the representation capacity of the convolution operator employed in our compression network, we introduce a recurrent information-based adaptive convolution block, thereby enhancing compression efficiency. Comprehensive experimental results validate the efficacy of our approach in achieving superior perceptual coding performance.
Keywords