VLFSE: Enhancing visual tracking through visual language fusion and state update evaluator

Fuchao Yang; Mingkai Jiang; Qiaohong Hao; Xiaolei Zhao; Qinghe Feng

Machine Learning with Applications (Dec 2024)

VLFSE: Enhancing visual tracking through visual language fusion and state update evaluator

Fuchao Yang,
Mingkai Jiang,
Qiaohong Hao,
Xiaolei Zhao,
Qinghe Feng

Affiliations

Fuchao Yang: School of Intelligent Engineering, Henan Institute of Technology, Xinxiang, Henan, China
Mingkai Jiang: School of Intelligent Engineering, Henan Institute of Technology, Xinxiang, Henan, China
Qiaohong Hao: School of Intelligent Engineering, Henan Institute of Technology, Xinxiang, Henan, China
Xiaolei Zhao: School of Intelligent Engineering, Henan Institute of Technology, Xinxiang, Henan, China; China Radio Wave Propagation Research Institute, Qingdao, Shandong, China
Qinghe Feng: School of Intelligent Engineering, Henan Institute of Technology, Xinxiang, Henan, China; Corresponding author.

Journal volume & issue: Vol. 18
p. 100588

Abstract

Read online

Recently, visual tracking algorithms have achieved impressive results by combining dynamic templates. However, the instability of visual images and the incorrect timing of template updates lead to decreased tracking accuracy and stability in intricate scenarios. To address these issues, we propose a visual tracking algorithm through visual language fusion and a state update evaluator (VLFSE). Specifically, our approach introduces a multimodal attention mechanism that uses self-attention to mine and integrate information from diverse sources effectively. This mechanism ensures a richer, context-aware representation of the target, enabling more accurate tracking even in complex scenes. Moreover, we recognize the critical need for precise template updates to maintain tracking accuracy over time. To this end, we develop a state update evaluator, a component trained online to assess the necessity and timing of template updates accurately. This evaluator acts as a safeguard, preventing erroneous updates and ensuring the tracker adapts optimally to changes in the target’s appearance. The experimental results on challenging visual language tracking datasets demonstrate our tracker’s superior performance, showcasing its adaptability and accuracy in complex tracking scenarios.

Published in Machine Learning with Applications

ISSN: 2666-8270 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General): Cybernetics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.journals.elsevier.com/machine-learning-with-applications

About the journal

Abstract

Keywords