Visual Intelligence (Dec 2024)
Spatial-temporal initialization dilemma: towards realistic visual tracking
Abstract
Abstract In this paper, we first investigate the phenomenon of the spatial-temporal initialization dilemma towards realistic visual tracking, which may adversely affect tracking performance. We summarize the aforementioned phenomenon by comparing differences of the initialization manners in existing tracking benchmarks and in real-world applications. The existing tracking benchmarks provide offline sequences and the expert annotations in the initial frame for trackers. However, in real-world applications, a tracker is often initialized by user annotations or an object detector, which may provide rough and inaccurate initialization. Moreover, annotation from the external feedback also introduces extra time costs while the video stream will not pause for waiting. We select four representative trackers and conduct full performance comparison on popular datasets with simulated initialization to intuitively describe the initialization dilemma of the task. Then, we propose a simple compensation framework to address this dilemma. The framework contains spatial-refine and temporal-chasing modules to mitigate performance degradation caused by the initialization dilemma. Furthermore, the proposed framework can be compatible with various popular trackers without retraining. Extensive experiments verify the effectiveness of our compensation framework.
Keywords