IEEE Access (Jan 2022)

Learning Depth Estimation From Memory Infusing Monocular Cues: A Generalization Prediction Approach

  • Yakun Zhou,
  • Jinting Luo,
  • Musen Hu,
  • Tingyong Wu,
  • Jinkuan Zhu,
  • Xingzhong Xiong,
  • Jienan Chen

DOI
https://doi.org/10.1109/ACCESS.2022.3151108
Journal volume & issue
Vol. 10
pp. 21359 – 21369

Abstract

Read online

Depth estimation from a single image is a challenging task, yet this field has a promising prospect in automatic driving and augmented reality. However, the prediction accuracy is degraded significantly when the trained network is transferred from the training dataset to real scenarios. To solve this issue, we propose MonoMeMa, a novel deep architecture based on the human monocular cue, which means humans can perceive depth information with one eye through the relative size of objects, light and shadow, etc. based on previous visual experience. Our method simulates the process of the formation and utilization of human monocular visual memory, including three steps: Firstly, MonoMeMa perceives and extracts real-world objects feature vectors (encoding). Then, it maintains and replaces the extracted feature vector over time (storing). Finally, MonoMeMa combines query objects feature vectors and memory to inference depth information (retrieving). According to the simulation results, our model shows the state-of-the-art results on the KITTI driving dataset. Moreover, MonoMema exhibits remarkable generalization performance when our model is migrated to other driving datasets without any finetune.

Keywords