A Fast Lightweight 3D Separable Convolutional Neural Network With Multi-Input Multi-Output for Moving Object Detection

Bingxin Hou; Ying Liu; Nam Ling; Lingzhi Liu; Yongxiong Ren

doi:10.1109/ACCESS.2021.3123975

IEEE Access (Jan 2021)

A Fast Lightweight 3D Separable Convolutional Neural Network With Multi-Input Multi-Output for Moving Object Detection

Bingxin Hou,
Ying Liu,
Nam Ling,
Lingzhi Liu,
Yongxiong Ren

Affiliations

Bingxin Hou: ORCiD; Department of Computer Science and Engineering, Santa Clara University, Santa Clara, CA, USA
Ying Liu: ORCiD; Department of Computer Science and Engineering, Santa Clara University, Santa Clara, CA, USA
Nam Ling: ORCiD; Department of Computer Science and Engineering, Santa Clara University, Santa Clara, CA, USA
Lingzhi Liu: ORCiD; Kwai, Inc., Palo Alto, CA, USA
Yongxiong Ren: Kwai, Inc., Palo Alto, CA, USA

DOI: https://doi.org/10.1109/ACCESS.2021.3123975
Journal volume & issue: Vol. 9
pp. 148433 – 148448

Abstract

Read online

Advances in moving object detection have been driven by the active application of deep learning methods. However, many existing models render superior detection accuracy at the cost of high computational complexity and slow inference speed. This fact has hindered the development of such models in mobile and embedded vision tasks, which need to be carried out in a timely fashion on a computationally limited platform. In this paper, we propose a super-fast (inference speed-154 fps) and lightweight (model size-1.45 MB) end-to-end 3D separable convolutional neural network with a multi-input multi-output (MIMO) strategy named “3DS_MM” for moving object detection. To improve detection accuracy, the proposed model adopts 3D convolution which is more suitable to extract both spatial and temporal information in video data than 2D convolution. To reduce model size and computational complexity, the standard 3D convolution is decomposed into depthwise and pointwise convolutions. Besides, we proposed a MIMO strategy to increase inference speed, which can take multiple frames as the network input and output multiple frames of detection results. Further, we conducted the scene dependent evaluation (SDE) and scene independent evaluation (SIE) on the benchmark CDnet2014 and DAVIS2016 datasets. Compared to state-of-the-art approaches, our proposed method significantly increases the inference speed, reduces the model size, meanwhile achieving the highest detection accuracy in the SDE setup and maintaining a competitive detection accuracy in the SIE setup.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords