NRVC: Neural Representation for Video Compression with Implicit Multiscale Fusion Network

Shangdong Liu; Puming Cao; Yujian Feng; Yimu Ji; Jiayuan Chen; Xuedong Xie; Longji Wu

doi:10.3390/e25081167

Entropy (Aug 2023)

NRVC: Neural Representation for Video Compression with Implicit Multiscale Fusion Network

Shangdong Liu,
Puming Cao,
Yujian Feng,
Yimu Ji,
Jiayuan Chen,
Xuedong Xie,
Longji Wu

Affiliations

Shangdong Liu: School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
Puming Cao: School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
Yujian Feng: School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
Yimu Ji: School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
Jiayuan Chen: School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
Xuedong Xie: School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
Longji Wu: School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China

DOI: https://doi.org/10.3390/e25081167
Journal volume & issue: Vol. 25, no. 8
p. 1167

Abstract

Read online

Recently, end-to-end deep models for video compression have made steady advancements. However, this resulted in a lengthy and complex pipeline containing numerous redundant parameters. The video compression approaches based on implicit neural representation (INR) allow videos to be directly represented as a function approximated by a neural network, resulting in a more lightweight model, whereas the singularity of the feature extraction pipeline limits the network’s ability to fit the mapping function for video frames. Hence, we propose a neural representation approach for video compression with an implicit multiscale fusion network (NRVC), utilizing normalized residual networks to improve the effectiveness of INR in fitting the target function. We propose the multiscale representations for video compression (MSRVC) network, which effectively extracts features from the input video sequence to enhance the degree of overfitting in the mapping function. Additionally, we propose the feature extraction channel attention (FECA) block to capture interaction information between different feature extraction channels, further improving the effectiveness of feature extraction. The results show that compared to the NeRV method with similar bits per pixel (BPP), NRVC has a 2.16% increase in the decoded peak signal-to-noise ratio (PSNR). Moreover, NRVC outperforms the conventional HEVC in terms of PSNR.

Published in Entropy

ISSN: 1099-4300 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Astronomy: Astrophysics; Science: Physics
Website: http://www.mdpi.com/journal/entropy

About the journal

Abstract

Keywords