IEEE Access (Jan 2021)
End-to-End Correlation Tracking With Enhanced Multi-Level Feature Fusion
Abstract
Discriminative correlation filters (DCF) have drawn increasing interest in visual tracking. In particular, a few recent works treat DCF as a special layer and add it into a Siamese network for visual tracking. However, most of them adopt shallow networks to learn target representations, which lack robust semantic information of deeper layers and make these works fail to handle significant appearance changes. In this paper, we design a novel Siamese network to fuse high-level semantic features and low-level spatial detail features for correlation tracking. Specifically, to introduce more semantic information into low-level features, we specially design a residual semantic embedding module to adaptively involve more semantic information from high-level features to guide the feature fusion. Furthermore, we adopt an effective and efficient channel attention mechanism to filter out noise information and make the network focus more on valuable features that are beneficial for visual tracking. The overall architecture is trained end-to-end offline to adaptively learn target representations, which are not only enabled to encode high-level semantic features and low-level spatial detail features, but also closely related to correlation filters. Experimental results on widely used OTB2013, OTB2015, VOT2016, TC-128, and UAV123 benchmarks show that our proposed tracker performs favorably against several state-of-the-art trackers.
Keywords