Remote Sensing (Aug 2024)

Macaron Attention: The Local Squeezing Global Attention Mechanism in Tracking Tasks

  • Zhixing Wang,
  • Hui Luo,
  • Dongxu Liu,
  • Meihui Li,
  • Yunfeng Liu,
  • Qiliang Bao,
  • Jianlin Zhang

DOI
https://doi.org/10.3390/rs16162896
Journal volume & issue
Vol. 16, no. 16
p. 2896

Abstract

Read online

The Unmanned Aerial Vehicle (UAV) tracking tasks find extensive utility across various applications. However, current Transformer-based trackers are generally tailored for diverse scenarios and lack specific designs for UAV applications. Moreover, due to the complexity of training in tracking tasks, existing models strive to improve tracking performance within limited scales, making it challenging to directly apply lightweight designs. To address these challenges, we introduce an efficient attention mechanism known as Macaron Attention, which we integrate into the existing UAV tracking framework to enhance the model’s discriminative ability within these constraints. Specifically, our attention mechanism comprises three components, fixed window attention (FWA), local squeezing global attention (LSGA), and conventional global attention (CGA), collectively forming a Macaron-style attention implementation. Firstly, the FWA module addresses the multi-scale issue in UAVs by cropping tokens within a fixed window scale in the spatial domain. Secondly, in LSGA, to adapt to the scale variation, we employ an adaptive clustering-based token aggregation strategy and design a “window-to-window” fusion attention model to integrate global attention with local attention. Finally, the CGA module is applied to prevent matrix rank collapse and improve tracking performance. By using the FWA, LSGA, and CGA modules, we propose a brand-new tracking model named MATrack. The UAV123 benchmark is the major evaluation dataset of MATrack with 0.710 and 0.911 on success and precision, individually.

Keywords