M‐CoTransT: Adaptive spatial continuity in visual tracking

Chunxiao Fan; Runqing Zhang; Yue Ming

doi:10.1049/cvi2.12092

IET Computer Vision (Jun 2022)

M‐CoTransT: Adaptive spatial continuity in visual tracking

Chunxiao Fan,
Runqing Zhang,
Yue Ming

Affiliations

Chunxiao Fan: Beijing Key Laboratory of Work Safety and Intelligent Monitoring School of Electronic Engineering Beijing University of Posts and Telecommunications Beijing China
Runqing Zhang: Beijing Key Laboratory of Work Safety and Intelligent Monitoring School of Electronic Engineering Beijing University of Posts and Telecommunications Beijing China
Yue Ming: Beijing Key Laboratory of Work Safety and Intelligent Monitoring School of Electronic Engineering Beijing University of Posts and Telecommunications Beijing China

DOI: https://doi.org/10.1049/cvi2.12092
Journal volume & issue: Vol. 16, no. 4
pp. 350 – 363

Abstract

Read online

Abstract Visual tracking is an important area in computer vision. Based on the Siamese network, current tracking methods employ the self‐attention block in convolutional networks to extract semantic features containing the image structure information of an object. However, spatial continuity is a point of contradiction between two seemingly unrelated challenges, that is, occlusion and similar distractor, in tracking methods. At the same time, it is a spatially discontinuous task to locate a target reappearing after occlusion accurately. The prediction of bounding boxes should be constrained by spatial continuity to prevent them from jumping into similar distractors. This study proposes a novel tracking method for introducing spatial continuity in visual tracking called M‐CoTransT; the novel tracking method is developed through the confidence‐based adaptive Markov motion model (M‐model) and a novel correlation‐based feature fusion network (CoTransT). In particular, the M‐model provides confidence for the nodes of the Markov motion model to estimate the motion state continuity. It also predicts a more accurate search region for CoTransT, which then adds a cross‐correlation branch into the self‐attention tracking network to enhance the continuity of target appearance in the feature space. Extensive experiments on five challenging datasets (LaSOT, GOT‐10k, TrackingNet, OTB‐2015 and UAV123) demonstrated the effectiveness of the proposed M‐CoTransT in visual tracking.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal