IET Computer Vision (Jun 2022)

M‐CoTransT: Adaptive spatial continuity in visual tracking

  • Chunxiao Fan,
  • Runqing Zhang,
  • Yue Ming

DOI
https://doi.org/10.1049/cvi2.12092
Journal volume & issue
Vol. 16, no. 4
pp. 350 – 363

Abstract

Read online

Abstract Visual tracking is an important area in computer vision. Based on the Siamese network, current tracking methods employ the self‐attention block in convolutional networks to extract semantic features containing the image structure information of an object. However, spatial continuity is a point of contradiction between two seemingly unrelated challenges, that is, occlusion and similar distractor, in tracking methods. At the same time, it is a spatially discontinuous task to locate a target reappearing after occlusion accurately. The prediction of bounding boxes should be constrained by spatial continuity to prevent them from jumping into similar distractors. This study proposes a novel tracking method for introducing spatial continuity in visual tracking called M‐CoTransT; the novel tracking method is developed through the confidence‐based adaptive Markov motion model (M‐model) and a novel correlation‐based feature fusion network (CoTransT). In particular, the M‐model provides confidence for the nodes of the Markov motion model to estimate the motion state continuity. It also predicts a more accurate search region for CoTransT, which then adds a cross‐correlation branch into the self‐attention tracking network to enhance the continuity of target appearance in the feature space. Extensive experiments on five challenging datasets (LaSOT, GOT‐10k, TrackingNet, OTB‐2015 and UAV123) demonstrated the effectiveness of the proposed M‐CoTransT in visual tracking.