Unsupervised learning of depth estimation, camera motion prediction and dynamic object localization from video

Delong Yang; Xunyu Zhong; Dongbing Gu; Xiafu Peng; Gongliu Yang; Chaosheng Zou

doi:10.1177/1729881420909653

International Journal of Advanced Robotic Systems (Mar 2020)

Unsupervised learning of depth estimation, camera motion prediction and dynamic object localization from video

Delong Yang,
Xunyu Zhong,
Dongbing Gu,
Xiafu Peng,
Gongliu Yang,
Chaosheng Zou

Affiliations

Delong Yang: Department of Automation, School of Aerospace Engineering, Xiamen University, Xiamen, China
Xunyu Zhong: Department of Automation, School of Aerospace Engineering, Xiamen University, Xiamen, China
Dongbing Gu: School of Computer Science and Electronic Engineering, Faculty of Science and Health, University of Essex, Colchester, Essex, UK
Xiafu Peng: Department of Automation, School of Aerospace Engineering, Xiamen University, Xiamen, China
Gongliu Yang: Department of Optoelectronic Engineering, School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing, China
Chaosheng Zou: Department of Automation, School of Aerospace Engineering, Xiamen University, Xiamen, China

DOI: https://doi.org/10.1177/1729881420909653
Journal volume & issue: Vol. 17

Abstract

Read online

Estimating scene depth, predicting camera motion and localizing dynamic objects from monocular videos are fundamental but challenging research topics in computer vision. Deep learning has demonstrated an amazing performance for these tasks recently. This article presents a novel unsupervised deep learning framework for scene depth estimation, camera motion prediction and dynamic object localization from videos. Consecutive stereo image pairs are used to train the system while only monocular images are needed for inference. The supervisory signals for the training stage come from various forms of image synthesis. Due to the use of consecutive stereo video, both spatial and temporal photometric errors are used to synthesize the images. Furthermore, to relieve the impacts of occlusions, adaptive left-right consistency and forward-backward consistency losses are added to the objective function. Experimental results on the KITTI and Cityscapes datasets demonstrate that our method is more effective in depth estimation, camera motion prediction and dynamic object localization compared to previous models.

Published in International Journal of Advanced Robotic Systems

ISSN: 1729-8814 (Online)
Publisher: SAGE Publishing
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journals.sagepub.com/home/arx

About the journal