Deep learning-guided video compression for machine vision tasks

Aro Kim; Seung-taek Woo; Minho Park; Dong-hwi Kim; Hanshin Lim; Soon-heung Jung; Sangwoon Kwak; Sang-hyo Park

doi:10.1186/s13640-024-00649-w

EURASIP Journal on Image and Video Processing (Sep 2024)

Deep learning-guided video compression for machine vision tasks

Aro Kim,
Seung-taek Woo,
Minho Park,
Dong-hwi Kim,
Hanshin Lim,
Soon-heung Jung,
Sangwoon Kwak,
Sang-hyo Park

Affiliations

Aro Kim: School of Computer Science and Engineering, Kyungpook National University
Seung-taek Woo: School of Computer Science and Engineering, Kyungpook National University
Minho Park: School of Computer Science and Engineering, Kyungpook National University
Dong-hwi Kim: School of Computer Science and Engineering, Kyungpook National University
Hanshin Lim: Media Research Division, Electronics and Telecommunications Research Institute
Soon-heung Jung: Media Research Division, Electronics and Telecommunications Research Institute
Sangwoon Kwak: Media Research Division, Electronics and Telecommunications Research Institute
Sang-hyo Park: School of Computer Science and Engineering, Kyungpook National University

DOI: https://doi.org/10.1186/s13640-024-00649-w
Journal volume & issue: Vol. 2024, no. 1
pp. 1 – 20

Abstract

Read online

Abstract In the video compression industry, video compression tailored to machine vision tasks has recently emerged as a critical area of focus. Given the unique characteristics of machine vision, the current practice of directly employing conventional codecs reveals inefficiency, which requires compressing unnecessary regions. In this paper, we propose a framework that more aptly encodes video regions distinguished by machine vision to enhance coding efficiency. For that, the proposed framework consists of deep learning-based adaptive switch networks that guide the efficient coding tool for video encoding. Through the experiments, it is demonstrated that the proposed framework has superiority over the latest standardization project, video coding for machine benchmark, which achieves a Bjontegaard delta (BD)-rate gain of 5.91% on average and reaches up to a 19.51% BD-rate gain.

Published in EURASIP Journal on Image and Video Processing

ISSN: 1687-5176 (Print); 1687-5281 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics
Website: https://jivp-eurasipjournals.springeropen.com

About the journal

Abstract

Keywords