IEEE Access (Jan 2021)
Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single Image
Abstract
We address the problem of depth estimation from a single monocular image in the paper. Depth estimation from a single image is an ill-posed and inherently ambiguous problem. In the paper, we propose an encoder-decoder structure with the feature pyramid to predict the depth map from a single RGB image. More specifically, the feature pyramid is used to detect objects of different scales in the image. The encoder structure aims to extract the most representative information from the original image through a series of convolution operations and to reduce the resolution of the input image. We adopt Res2-50 as the encoder to extract important features. The decoder section uses a novel upsampling structure to improve the output resolution. Then, we also propose a novel loss function that adds gradient loss and surface normal loss to the depth loss, which can predict not only the global depth but also the depth of fuzzy edges and small objects. Additionally, we use Adam as our optimization function to optimize our network and speed up convergence. Our extensive experimental evaluation proves the efficiency and effectiveness of the method, which is competitive with previous methods on the Make3D dataset and outperforms state-of-the-art methods on the NYU Depth v2 dataset.
Keywords