Transformer-Based Visual Object Tracking with Global Feature Enhancement

Shuai Wang; Genwen Fang; Lei Liu; Jun Wang; Kongfen Zhu; Silas N. Melo

doi:10.3390/app132312712

Applied Sciences (Nov 2023)

Transformer-Based Visual Object Tracking with Global Feature Enhancement

Shuai Wang,
Genwen Fang,
Lei Liu,
Jun Wang,
Kongfen Zhu,
Silas N. Melo

Affiliations

Shuai Wang: School of Computer Science and Technology, Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), Huaibei Normal University, Huaibei 235000, China
Genwen Fang: School of Computer Science and Technology, Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), Huaibei Normal University, Huaibei 235000, China
Lei Liu: School of Computer Science and Technology, Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), Huaibei Normal University, Huaibei 235000, China
Jun Wang: College of Electronic and Information Engineering, Hebei University, Baoding 071000, China
Kongfen Zhu: School of Computer Science and Technology, Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), Huaibei Normal University, Huaibei 235000, China
Silas N. Melo: Department of Geography, Universidade Estadual do Maranhão, São Luís 65055-000, Brazil

DOI: https://doi.org/10.3390/app132312712
Journal volume & issue: Vol. 13, no. 23
p. 12712

Abstract

Read online

With the rise of general models, transformers have been adopted in visual object tracking algorithms as feature fusion networks. In these trackers, self-attention is used for global feature enhancement. Cross-attention is applied to fuse the features of the template and the search regions to capture the global information of the object. However, studies have found that the feature information fused by cross-attention does not pay enough attention to the object region. In order to enhance cross-attention for the object region, an enhanced cross-attention (ECA) module is proposed for global feature enhancement. By calculating the average attention score for each position in the fused feature sequence and assigning higher weights to the positions with higher attention scores, the proposed ECA module can improve the feature information in the object region and further enhance the matching accuracy. In addition, to reduce the computational complexity of self-attention, orthogonal random features are introduced to implement a fast attention operation. This decomposes the attention matrix into the product of a random non-linear function between the original query and key. This module can reduce the spatial complexity and improve the inference speed by avoiding the explicit construction of a quadratic attention matrix. Finally, a tracking method named GFETrack is proposed, which comprises a Siamese backbone network and an enhanced attention mechanism. Experimental results show that the proposed GFETrack achieves competitive results on four challenging datasets.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords