Multi‐template temporal information fusion for Siamese object tracking

Xiaofeng Lu; Zhengyang Wang; Xuan Wang; Xinhong Hei

doi:10.1049/cvi2.12128

IET Computer Vision (Feb 2023)

Multi‐template temporal information fusion for Siamese object tracking

Xiaofeng Lu,
Zhengyang Wang,
Xuan Wang,
Xinhong Hei

Affiliations

Xiaofeng Lu: Department of Computer Science and Technology Xi'an University of Technology Xi'an Shaanxi China
Zhengyang Wang: Department of Computer Science and Technology Xi'an University of Technology Xi'an Shaanxi China
Xuan Wang: Department of Computer Science and Technology Xi'an University of Technology Xi'an Shaanxi China
Xinhong Hei: Department of Computer Science and Technology Xi'an University of Technology Xi'an Shaanxi China

DOI: https://doi.org/10.1049/cvi2.12128
Journal volume & issue: Vol. 17, no. 1
pp. 51 – 61

Abstract

Read online

Abstract The object tracking algorithm based on Siamese network often extracts the deep feature of the target to be tracked from the first frame of the video sequence as a template, and uses the template for the whole tracking process. Because the manually annotated target in the first frame of video sequence is more accurate, these algorithms often have stable performance. However, it is difficult to adapt to the changing target features only using the target template extracted from the first frame. Inspired by the feature fusion network based on a transformer, this paper proposes a template update module called multi‐template temporary information fusion module (MTFM), which can be trained offline. By fusing multiple target template features on time series, the template can always adapt to the changes of target appearance in the tracking process. In order to train the MTFM, this paper proposes a training method using time series data and Mean Square Error (MSE) as the loss function. This paper uses the MTFM on SiamFC++ tracker, and obtains good experimental results in three challenging datasets, including VOT2016, OTB100 and GOT‐10k. The running speed of the algorithm on graphics processing unit (GPU) is maintained at about 200fps, which exhibits good real‐time performance.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords