Slimmable Multi-Task Image Compression for Human and Machine Vision

Jiangzhong Cao; Ximei Yao; Huan Zhang; Jian Jin; Yun Zhang; Bingo Wing-Kuen Ling

doi:10.1109/ACCESS.2023.3261668

IEEE Access (Jan 2023)

Slimmable Multi-Task Image Compression for Human and Machine Vision

Jiangzhong Cao,
Ximei Yao,
Huan Zhang,
Jian Jin,
Yun Zhang,
Bingo Wing-Kuen Ling

Affiliations

Jiangzhong Cao: School of Information Engineering, Guangdong University of Technology, Guangzhou, China
Ximei Yao: ORCiD; School of Information Engineering, Guangdong University of Technology, Guangzhou, China
Huan Zhang: ORCiD; School of Information Engineering, Guangdong University of Technology, Guangzhou, China
Jian Jin: ORCiD; Alibaba-NTU Singapore Joint Research Institute, Nanyang Technological University, Jurong West, Singapore
Yun Zhang: School of Electronics and Communication Engineering, Sun Yat-sen University, Shenzhen, China
Bingo Wing-Kuen Ling: ORCiD; School of Information Engineering, Guangdong University of Technology, Guangzhou, China

DOI: https://doi.org/10.1109/ACCESS.2023.3261668
Journal volume & issue: Vol. 11
pp. 29946 – 29958

Abstract

Read online

In the Internet of Things (IoT) communications, visual data are frequently processed among intelligent devices using artificial intelligence algorithms, replacing humans for analysis and decision-making while only occasionally requiring human scrutiny. However, due to high redundancy of compressive encoders, existing image coding solutions for machine vision are inefficient at runtime. To balance the rate-accuracy performance and efficiency of image compression for machine vision while attaining high-quality reconstructed images for human vision, this paper introduces a novel slimmable multi-task compression framework for human and machine vision in visual IoT applications. Firstly, image compression for human and machine vision under the constraint of bandwidth, latency, and computational resources is modeled as a multi-task optimization problem. Secondly, slimmable encoders are employed for multiple human and machine vision tasks in which the parameters of the sub-encoder for machine vision tasks are shared among all tasks and jointly learned. Thirdly, to solve the feature match between latent representation and intermediate features of deep vision networks, feature transformation networks are introduced as decoders of machine vision feature compression. Finally, the proposed framework is successfully applied to human and machine vision tasks’ scenarios, e.g., object detection and image reconstruction. Experimental results show that the proposed method outperforms baselines and other image compression approaches on machine vision tasks with higher efficiency (shorter latency) in two vision tasks’ scenarios while retaining comparable quality on image reconstruction.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords