Deep Features Homography Transformation Fusion Network—A Universal Foreground Segmentation Algorithm for PTZ Cameras and a Comparative Study

Ye Tao; Zhihao Ling

doi:10.3390/s20123420

Sensors (Jun 2020)

Deep Features Homography Transformation Fusion Network—A Universal Foreground Segmentation Algorithm for PTZ Cameras and a Comparative Study

Ye Tao,
Zhihao Ling

Affiliations

Ye Tao: Key Laboratory of Advanced Control and Optimization for Chemical Processes, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China
Zhihao Ling: Key Laboratory of Advanced Control and Optimization for Chemical Processes, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China

DOI: https://doi.org/10.3390/s20123420
Journal volume & issue: Vol. 20, no. 12
p. 3420

Abstract

Read online

The foreground segmentation method is a crucial first step for many video analysis methods such as action recognition and object tracking. In the past five years, convolutional neural network based foreground segmentation methods have made a great breakthrough. However, most of them pay more attention to stationary cameras and have constrained performance on the pan–tilt–zoom (PTZ) cameras. In this paper, an end-to-end deep features homography transformation and fusion network based foreground segmentation method (HTFnetSeg) is proposed for surveillance videos recorded by PTZ cameras. In the kernel of HTFnetSeg, there is the combination of an unsupervised semantic attention homography estimation network (SAHnet) for frames alignment and a spatial transformed deep features fusion network (STDFFnet) for segmentation. The semantic attention mask in SAHnet reinforces the network to focus on background alignment by reducing the noise that comes from the foreground. STDFFnet is designed to reuse the deep features extracted during the semantic attention mask generation step by aligning the features rather than only the frames, with a spatial transformation technique in order to reduce the algorithm complexity. Additionally, a conservative strategy is proposed for the motion map based post-processing step to further reduce the false positives that are brought by semantic noise. The experiments on both CDnet2014 and Lasiesta show that our method outperforms many state-of-the-art methods, quantitively and qualitatively.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords