IEEE Access (Jan 2024)
Quality Enhancement Based Video Captioning in Video Communication Systems
Abstract
Video captioning is an automatic task that collects natural language to represent visual content. Recently, it has achieved lots of amazing progress thanks to deep learning techniques. Most techniques have mainly focused on a deep learning network architecture, whereas video quality and resolution have not been fully considered, although their impact on captioning performance is very strong. Since video communication systems usually perform compression, original quality and resolution can be degraded and down-sampled for significant reduction of the data size, which results in severe quality degradation. Hence, this paper analyzes the impact of the compression and the down-sampling on the captioning, and proposes a quality enhancement method for the video captioning. First, the proposed method performs quality classification to investigate the quality of each frame. Next, super-resolution (SR) is used to enhance the frames in terms of their quality and resolution. Finally, a video captioning network uses the enhanced frames to generate accurate sentences. Experimental results show that the proposed method drastically improves the captioning performance, when both quality and resolution of input videos are randomly determined.
Keywords