IEEE Access (Jan 2019)
An Adaptive Unsupervised Learning Framework for Monocular Depth Estimation
Abstract
Depth estimation from a single image plays an important role in 3D scene perception. Owing to the development of deep convolutional neural networks (CNNs), monocular depth estimation models have achieved a large number of exciting results. However, the requirement for the manual per-pixel labeled dataset (ground truth) limits the application of these supervised methods. Basing on a geometric constraint between the consecutive stereo images, we propose an unsupervised method to infer the scene structure. We train the model with consecutive stereo images as input while only a single image is required at test time. In contrast to previous works, this paper presents an adaptive loss function to tackle the regions which are non-overlapping between consecutive images. Moreover, by exploiting the pixels' discontinuity in the edge region and the continuity in the non-edge region of a depth image, we propose a novel depth smoothness loss to improve the accuracy of the model. In addition, as an auxiliary task, our model also obtains the camera motion between consecutive images. Experimental results on the KITTI and Cityscapes datasets show that our model outperforms other unsupervised frameworks and some supervised frameworks.
Keywords