SRFormer: Efficient Yet Powerful Transformer Network for Single Image Super Resolution

Armin Mehri; Parichehr Behjati; Dario Carpio; Angel Domingo Sappa

doi:10.1109/ACCESS.2023.3328229

IEEE Access (Jan 2023)

SRFormer: Efficient Yet Powerful Transformer Network for Single Image Super Resolution

Armin Mehri,
Parichehr Behjati,
Dario Carpio,
Angel Domingo Sappa

Affiliations

Armin Mehri: ORCiD; Computer Vision Center, Autonomous University of Barcelona, Barcelona, Spain
Parichehr Behjati: Computer Vision Center, Autonomous University of Barcelona, Barcelona, Spain
Dario Carpio: ORCiD; ESPOL Polytechnic University, Guayaquil, EC, Ecuador
Angel Domingo Sappa: ORCiD; Computer Vision Center, Autonomous University of Barcelona, Barcelona, Spain

DOI: https://doi.org/10.1109/ACCESS.2023.3328229
Journal volume & issue: Vol. 11
pp. 121457 – 121469

Abstract

Read online

Recent breakthroughs in single image super resolution have investigated the potential of deep Convolutional Neural Networks (CNNs) to improve performance. However, CNNs based models suffer from their limited fields and their inability to adapt to the input content. Recently, Transformer based models were presented, which demonstrated major performance gains in Natural Language Processing and Vision tasks while mitigating the drawbacks of CNNs. Nevertheless, Transformer computational complexity can increase quadratically for high-resolution images, and the fact that it ignores the original structures of the image by converting them to the 1D structure can make it problematic to capture the local context information and adapt it for real-time applications. In this paper, we present, SRFormer, an efficient yet powerful Transformer-based architecture, by making several key designs in the building of Transformer blocks and Transformer layers that allow us to consider the original structure of the image (i.e., 2D structure) while capturing both local and global dependencies without raising computational demands or memory consumption. We also present a Gated Multi-Layer Perceptron (MLP) Feature Fusion module to aggregate the features of different stages of Transformer blocks by focusing on inter-spatial relationships while adding minor computational costs to the network. We have conducted extensive experiments on several super-resolution benchmark datasets to evaluate our approach. SRFormer demonstrates superior performance compared to state-of-the-art methods from both Transformer and Convolutional networks, with an improvement margin of $0.1 \sim 0.53dB$ . Furthermore, while SRFormer has almost the same model size, it outperforms SwinIR by 0.47% and inference time by half the time of SwinIR. The code will be available on GitHub.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords