IEEE Access (Jan 2023)

Dual Feature Fusion Tracking With Combined Cross-Correlation and Transformer

  • Chao Che,
  • Yanyun Fu,
  • Wenxi Shi,
  • Zhansheng Zhu,
  • Deyong Wang

DOI
https://doi.org/10.1109/ACCESS.2023.3346044
Journal volume & issue
Vol. 11
pp. 144966 – 144977

Abstract

Read online

Siamese networks have found applications in various fields, notably object tracking, due to their remarkable speed and accuracy. Siamese tracking networks rely on cross-correlation to obtain the similarity score between the target template and the search region. However, since cross-correlation is a local matching operation, it cannot effectively capture the global context information. While the Transformer for feature fusion can better capture long-range dependencies and obtain more semantic information, more localized edge information is needed to distinguish the target from the background. Cross-correlation fusion and Transformer fusion have their advantages. They can complement each other, so we combine them and propose a dual feature fusion tracker (SiamCT) to obtain the local correlations and global dependencies between the target and the search region. Specifically, we construct two parallel feature fusion paths based on cross-correlation and Transformer. Among them, for cross-correlation fusion, we adopt the more efficient two-dimension pixel-wise cross-correlation (TDPC), which performs correlation operations from both spatial and channel dimensions, and the interaction of multidimensional information helps to realize more accurate feature fusion. Subsequently, the fused features are augmented by coordinate attention (CA) for orientation-dependent positional information. For Transformer fusion, we introduce cos-based linear attention(ClA) to improve Transformer’s ability to acquire global context information. Our SiamCT outperforms existing leading methods in GOT-10k, LaSOT, TrackingNet, and OTB100 benchmarks based on extensive experiments. In particular, the AO score on the GOT-10k benchmark is 70.6%, and the ${SR_{0.5}}$ and ${SR_{0.75}}$ scores are 80.5%, 65.9%, respectively, achieving state-of-the-art performance.

Keywords