LGViT: A Local and Global Vision Transformer with Dynamic Contextual Position Bias Using Overlapping Windows

Qian Zhou; Hua Zou; Huanhuan Wu

doi:10.3390/app13031993

Applied Sciences (Feb 2023)

LGViT: A Local and Global Vision Transformer with Dynamic Contextual Position Bias Using Overlapping Windows

Qian Zhou,
Hua Zou,
Huanhuan Wu

Affiliations

Qian Zhou: School of Computer Science, Wuhan University, Wuhan 430072, China
Hua Zou: School of Computer Science, Wuhan University, Wuhan 430072, China
Huanhuan Wu: College of Information Engineering, Tarim University, Alar 843300, China

DOI: https://doi.org/10.3390/app13031993
Journal volume & issue: Vol. 13, no. 3
p. 1993

Abstract

Read online

Vision Transformers (ViTs) have shown their superiority in various visual tasks for the capability of self-attention mechanisms to model long-range dependencies. Some recent works try to reduce the high cost of vision transformers by limiting the self-attention module in a local window. As a price, the adopted window-based self-attention also reduces the ability to capture the long-range dependencies compared with the original self-attention in transformers. In this paper, we propose a Local and Global Vision Transformer (LGViT) that incorporates overlapping windows and multi-scale dilated pooling to robust the self-attention locally and globally. Our proposed self-attention mechanism is composed of a local self-attention module (LSA) and a global self-attention module (GSA), which are performed on overlapping windows partitioned from the input image. In LSA, the key and value sets are expanded by the surroundings of windows to increase the receptive field. For GSA, the key and value sets are expanded by multi-scale dilated pooling to promote global interactions. Moreover, a dynamic contextual positional encoding module is exploited to add positional information more efficiently and flexibly. We conduct extensive experiments on various visual tasks and the experimental results strongly demonstrate the outperformance of our proposed LGViT to state-of-the-art approaches.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords