Transformer tracking with multi-scale dual-attention

Jun Wang; Changwang Lai; Wenshuang Zhang; Yuanyun Wang; Chenchen Meng

doi:10.1007/s40747-023-01043-1

Complex & Intelligent Systems (Apr 2023)

Transformer tracking with multi-scale dual-attention

Jun Wang,
Changwang Lai,
Wenshuang Zhang,
Yuanyun Wang,
Chenchen Meng

Affiliations

Jun Wang: School of Information Engineering, Nanchang Institute of Technology
Changwang Lai: School of Information Engineering, Nanchang Institute of Technology
Wenshuang Zhang: School of Information Engineering, Nanchang Institute of Technology
Yuanyun Wang: School of Information Engineering, Nanchang Institute of Technology
Chenchen Meng: NSFOCUS Technologies Group Co., Ltd

DOI: https://doi.org/10.1007/s40747-023-01043-1
Journal volume & issue: Vol. 9, no. 5
pp. 5793 – 5806

Abstract

Read online

Abstract Transformer-based trackers greatly improve tracking success rate and precision rate. Attention mechanism in Transformer can fully explore the context information across successive frames. Nevertheless, it ignores the equally important local information and structured spatial information. And irrelevant regions may also affect the template features and search region features. In this work, a multi-scale feature fusion network is designed with box attention and instance attention in Encoder–Decoder architecture based on Transformer. After extracting features, the local information and structured spatial information is learnt by multi-scale box attention, and the global context information is explored by instance attention. Box attention samples grid features from the region of interest. Therefore, it effectively focuses on the region of interest (ROI) and avoids the influence of irrelevant regions in feature extraction. At the same time, instance attention can also pay attention to the context information across successive frames, and avoid falling into local optimum. The long-range feature dependencies are learned in this stage. Extensive experiments are conducted on six challenging tracking datasets to demonstrate the superiority of the proposed tracker MDTT, including UAV123, GOT-10k, LaSOT, VOT2018, TrackingNet, and NfS. In particular, the proposed tracker achieves AUC score of $$64.7 \% $$ 64.7 % on LaSOT, $$78.1 \%$$ 78.1 % on TrackingNet and precision score of $$89.2 \%$$ 89.2 % on UAV123, which outperforms the baseline and most recent advanced trackers.

Published in Complex & Intelligent Systems

ISSN: 2199-4536 (Print); 2198-6053 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.springer.com/journal/40747

About the journal

Abstract

Keywords