IEEE Access (Jan 2019)
STResNet_CF Tracker: The Deep Spatiotemporal Features Learning for Correlation Filter Based Robust Visual Object Tracking
Abstract
Constructing a robust appearance model of the visual object is a crucial task for visual object tracking. Recently, more and more studies combine spatial feature with a temporal feature to improve the tracking performance. These methods successfully apply the features from spatial and temporal to address the problem for tracking. This paper presents a novel method for visual object tracking based on spatiotemporal feature combined with correlation filters. In this paper, the visual features of a target object are extracted from a spatial-temporal residual network (STResNet) appearance model with two sub-networks. The STResNet appearance model learns separately spatial feature and temporal feature, respectively, so that we can effectively utilize spatial context around the surrounding of the target object in each frame and the temporal relationship between successive frames to refine the appearance representation of the target object. Finally, our spatiotemporal fusion feature from STResNet appearance model is incorporated into the correlation filter for robust visual object tracking. The experimental results show that our method achieves similar or better performance compared with the other tracking methods based on convolutional neural networks or correlation filter.
Keywords