Remote Sensing (May 2021)

Unsupervised Learning of Depth from Monocular Videos Using 3D-2D Corresponding Constraints

  • Fusheng Jin,
  • Yu Zhao,
  • Chuanbing Wan,
  • Ye Yuan,
  • Shuliang Wang

DOI
https://doi.org/10.3390/rs13091764
Journal volume & issue
Vol. 13, no. 9
p. 1764

Abstract

Read online

Depth estimation can provide tremendous help for object detection, localization, path planning, etc. However, the existing methods based on deep learning have high requirements on computing power and often cannot be directly applied to autonomous moving platforms (AMP). Fifth-generation (5G) mobile and wireless communication systems have attracted the attention of researchers because it provides the network foundation for cloud computing and edge computing, which makes it possible to utilize deep learning method on AMP. This paper proposes a depth prediction method for AMP based on unsupervised learning, which can learn from video sequences and simultaneously estimate the depth structure of the scene and the ego-motion. Compared with the existing unsupervised learning methods, our method makes the spatial correspondence among pixel points consistent with the image area by smoothing the 3D corresponding vector field based on 2D image, which effectively improves the depth prediction ability of the neural network. Our experiments on the KITTI driving dataset demonstrated that our method outperformed other previous learning-based methods. The results on the Apolloscape and Cityscapes datasets show that our proposed method has a strong universality.

Keywords