Applied Sciences (Jul 2024)

Monocular Depth Estimation Based on Dilated Convolutions and Feature Fusion

  • Hang Li,
  • Shuai Liu,
  • Bin Wang,
  • Yuanhao Wu

DOI
https://doi.org/10.3390/app14135833
Journal volume & issue
Vol. 14, no. 13
p. 5833

Abstract

Read online

Depth estimation represents a prevalent research focus within the realm of computer vision. Existing depth estimation methodologies utilizing LiDAR (Light Detection and Ranging) technology typically obtain sparse depth data and are associated with elevated hardware expenses. Multi-view image-matching techniques necessitate prior knowledge of camera intrinsic parameters and frequently encounter challenges such as depth inconsistency, loss of details, and the blurring of edges. To tackle these challenges, the present study introduces a monocular depth estimation approach based on an end-to-end convolutional neural network. Specifically, a DNET backbone has been developed, incorporating dilated convolution and feature fusion mechanisms within the network architecture. By integrating semantic information from various receptive fields and levels, the model’s capacity for feature extraction is augmented, thereby enhancing its sensitivity to nuanced depth variations within the image. Furthermore, we introduce a loss function optimization algorithm specifically designed to address class imbalance, thereby enhancing the overall predictive accuracy of the model. Training and validation conducted on the NYU Depth-v2 (New York University Depth Dataset Version 2) and KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) datasets demonstrate that our approach outperforms other algorithms in terms of various evaluation metrics.

Keywords