ACSiamRPN: Adaptive Context Sampling for Visual Object Tracking

Xiaofei Qin; Yipeng Zhang; Hang Chang; Hao Lu; Xuedian Zhang

doi:10.3390/electronics9091528

Electronics (Sep 2020)

ACSiamRPN: Adaptive Context Sampling for Visual Object Tracking

Xiaofei Qin,
Yipeng Zhang,
Hang Chang,
Hao Lu,
Xuedian Zhang

Affiliations

Xiaofei Qin: School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
Yipeng Zhang: School of Mechanical Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
Hang Chang: Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
Hao Lu: Guangxi Yuchai Machinery Co., Ltd., Nanning 530007, China
Xuedian Zhang: School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China

DOI: https://doi.org/10.3390/electronics9091528
Journal volume & issue: Vol. 9, no. 9
p. 1528

Abstract

Read online

In visual object tracking fields, the Siamese network tracker, based on the region proposal network (SiamRPN), has achieved promising tracking effects, both in speed and accuracy. However, it did not consider the relationship and differences between the long-range context information of various objects. In this paper, we add a global context block (GC block), which is lightweight and can effectively model long-range dependency, to the Siamese network part of SiamRPN so that the object tracker can better understand the tracking scene. At the same time, we propose a novel convolution module, called a cropping-inside selective kernel block (CiSK block), based on selective kernel convolution (SK convolution, a module proposed in selective kernel networks) and use it in the region proposal network (RPN) part of SiamRPN, which can adaptively adjust the size of the receptive field for different types of objects. We make two improvements to SK convolution in the CiSK block. The first improvement is that in the fusion step of SK convolution, we use both global average pooling (GAP) and global maximum pooling (GMP) to enhance global information embedding. The second improvement is that after the selection step of SK convolution, we crop out the outermost pixels of features to reduce the impact of padding operations. The experiment results show that on the OTB100 benchmark, we achieved an accuracy of 0.857 and a success rate of 0.643. On the VOT2016 and VOT2019 benchmarks, we achieved expected average overlap (EAO) scores of 0.394 and 0.240, respectively.

Published in Electronics

ISSN: 2079-9292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics
Website: http://www.mdpi.com/journal/electronics

About the journal

Abstract

Keywords