Advances in Multimedia (Jan 2024)
Fast Visual Tracking with Enhanced and Gradient-Guide Network
Abstract
The existing Siamese trackers express visual tracking through the cross-correlation operation between two neural networks. Although they dominated the tracking field, their adopted pattern caused two main problems. One is the adoption of the deep architecture that drives the Siamese tracker to sacrifice speed for performance, and the other is that the template is fixed to the initial features; namely, the template cannot be updated timely, making performance entirely dependent on the Siamese network’s matching ability. In this work, we propose a tracker called SiamMLG. Firstly, we adopt the lightweight ResNet-34 as the backbone to improve the proposed tracker’s speed by reducing the computational complexity, and then, to compensate for the performance loss caused by the lightweight backbone, we embed the SKNet from the attention mechanism to filter out the valueless features, and finally, we utilize the gradient-guide strategy to update the template timely. Extensive experiments on four large tracking datasets, including VOT-2016, OTB100, GOT-10k, and UAV123, confirming SiamMLG satisfactorily balance performance and efficiency, where it scores 0.515 on GOT-10k while running at 55 frames per second, which is nearly 3.6 times that of the state-of-the-art method.