IET Image Processing (Apr 2022)

Semantic and context features integration for robust object tracking

  • Jinzhen Yao,
  • Jianlin Zhang,
  • Zhixing Wang,
  • Linsong Shao

DOI
https://doi.org/10.1049/ipr2.12407
Journal volume & issue
Vol. 16, no. 5
pp. 1268 – 1279

Abstract

Read online

Abstract Siamese network‐based object tracking learns features of a target object marked in the first frame and that of the object in subsequent frames simultaneously and then measures similarity between two features to recognize and locate the object. Owing to their efficiency and high accuracy, Siamese networks have attracted much attention recently. However, tracking accuracy decreases significantly when there are scale changes, occlusion, and pose variations due to the way that Siamese networks estimate feature similarity. To address this issue, the authors propose a tracking algorithm, named Semantic and context features integration for robust object tracking that integrates local and global features of the object. Local features provide context information for tracking parts of the object, while global features contain semantic information for tracking the object. The authors meticulously design local and global classification and regression heads and integrate them into one uniform framework to achieve integration tracking. This method effectively alleviates low accuracy in complex scenes such as scale changes, deformation, and occlusion. Numerous experiments demonstrate that this method achieves state‐of‐art (SOTA) performance with 45 FPS on a single RTX2060 Super GPU on public tracking datasets, including VOT2016, VOT2019, OTB100, GOT‐10k, and LaSOT, and its effectiveness and efficiency is confirmed.