Convolved Quality Transformer: Image Quality Assessment via Long-Range Interaction Between Local Perception

Heeseok Oh; Jinwoo Kim; Taewan Kim; Sanghoon Lee

doi:10.1109/ACCESS.2022.3209810

IEEE Access (Jan 2022)

Convolved Quality Transformer: Image Quality Assessment via Long-Range Interaction Between Local Perception

Heeseok Oh,
Jinwoo Kim,
Taewan Kim,
Sanghoon Lee

Affiliations

Heeseok Oh: ORCiD; Department of Applied AI, Hansung University, Seoul, South Korea
Jinwoo Kim: ORCiD; Department of Electrical and Electronic Engineering, Yonsei University, Seoul, South Korea
Taewan Kim: ORCiD; Data Science Major, Dongduk Women’s University, Seoul, South Korea
Sanghoon Lee: ORCiD; Department of Electrical and Electronic Engineering, Yonsei University, Seoul, South Korea

DOI: https://doi.org/10.1109/ACCESS.2022.3209810
Journal volume & issue: Vol. 10
pp. 102968 – 102980

Abstract

Read online

A hybrid architecture composed of a convolutional neural network (CNN) and a Transformer is the new trend in realizing various vision tasks while pushing the limits of learning representation. From the perspective of mechanisms of CNN and Transformer, a functional combination of them is suitable for the image quality assessment (IQA) since which requires leveraging both local distortion perception and global quality aggregation, however, there has been scarce study employing such an approach. This paper presents an end-to-end CNN-Transformer hybrid model for full-reference IQA named convolved quality transformer (CQT). The CQT is inspired by the human’s perceptual characteristics and is designed to unify the advantages of both CNN and Transformer for evaluating quality score. In CQT, convolutional layers specialize in local distortion feature extraction whereas Transformer aggregates them to estimate holistic quality via long-range interaction between them. Such a series of processes is repeated on multi-scale feature maps to capture quality representation sensitively. To verify submodules in CQT perform their roles properly, we in-depth analyze the interaction between local distortions inferring global quality with attention visualization. Finally, the perceptually pooled information from stage-wise feature embeddings derives the final quality level. The experimental results demonstrate that the proposed model achieves superior performance in comparison to previous data-driven approaches, and which is even well-generalized over standard datasets.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords