Semi-Supervised Video Object Segmentation Based on Local and Global Consistency Learning

Huagang Liang; Lihua Liu; Ying Bo; Chao Zuo

doi:10.1109/ACCESS.2021.3112014

IEEE Access (Jan 2021)

Semi-Supervised Video Object Segmentation Based on Local and Global Consistency Learning

Huagang Liang,
Lihua Liu,
Ying Bo,
Chao Zuo

Affiliations

Huagang Liang: College of Electronic and Control Engineering, Chang’an University, Xi’an, China
Lihua Liu: ORCiD; College of Electronic and Control Engineering, Chang’an University, Xi’an, China
Ying Bo: ORCiD; College of Electronic and Control Engineering, Chang’an University, Xi’an, China
Chao Zuo: ORCiD; College of Electronic and Control Engineering, Chang’an University, Xi’an, China

DOI: https://doi.org/10.1109/ACCESS.2021.3112014
Journal volume & issue: Vol. 9
pp. 127293 – 127304

Abstract

Read online

Due to the variety of video types and different quality on the Internet, it brings more challenges to video processing algorithms such as video object segmentation. Most existing video object segmentation methods rely on modules in other fields as an additional structure of the segmentation model. The combination of modules can improve the accuracy of the model, but it will also reduce the algorithm speed. This paper proposes a semi-supervised video object segmentation method based on local and global consistency learning, which does not rely on additional structures to achieve fast segmentation. First, we extract the embedding features of the image based on GhostNet which is the lightweight network. By using the embedded features of pixels, the graph model is established based on the similarity between pixels. Second, we adopt the local-global consistency learning framework to construct the label conduction model. Third, to optimize the memory occupation and inference speed of the model, we propose a sampling strategy for reference frames by considering local and global information. Finally, we establish a high-speed monitoring video dataset to verify the practical application effect of the method. Our method achieves a result of 69.5% $J\& F$ mean with 46 FPS on DAVIS 2017 dataset. At the same time, this paper constructed a high-speed monitoring video dataset. The algorithm obtained 68.2% $J\& F$ on this dataset, indicating that the method has good generalization and robust performance in practical applications.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords