IEEE Access (Jan 2019)
Single-Stream Deep Similarity Learning Tracking
Abstract
The deep similarity tracking via two-stream or multiple-stream network architectures has drawn great attention due to its strong capability of extracting discriminative feature with balanced accuracy and speed. However, these networks need a careful data pairing processing and are usually difficult to be updated for online visual tracking. In this paper, we propose a simple and effective discriminative feature extractor via a Single-Stream Deep Similarity learning for online visual Tracking, defined by SSDST. Different from the popular two-stream or multiple-stream architecture, the proposed method is built on a usual CNN architecture such as VGG-M network only with one branch. We design a contrastive loss layer, where the samples are implicitly paired, to directly learn discriminative feature on the large video dataset. The proposed network is easily applied to online tracking by adding a binary classification layer instead of contrastive loss layer for handling a specific video. The proposed SSDST is extensively verified on two representative benchmarks and shows better advantages over online trackers and the two-stream or multiple-stream trackers.
Keywords