IET Computer Vision (Oct 2013)

Fusion of dense spatial features and sparse temporal features for three‐dimensional structure estimation in urban scenes

  • Mohamad Motasem Nawaf,
  • Alain Trémeau

DOI
https://doi.org/10.1049/iet-cvi.2012.0270
Journal volume & issue
Vol. 7, no. 5
pp. 302 – 310

Abstract

Read online

The authors present a novel approach to improve three‐dimensional (3D) structure estimation from an image stream in urban scenes. The authors consider a particular setup, where the camera is installed on a moving vehicle. Applying traditional structure from motion (SfM) technique in this case generates poor estimation of the 3D structure because of several reasons such as texture‐less images, small baseline variations and dominant forward camera motion. The authors idea is to introduce the monocular depth cues that exist in a single image, and add time constraints on the estimated 3D structure. The scene is modelled as a set of small planar patches obtained using over‐segmentation, and the goal is to estimate the 3D positioning of these planes. The authors propose a fusion scheme that employs Markov random field model to integrate spatial and temporal depth features. Spatial depth is obtained by learning a set of global and local image features. Temporal depth is obtained via sparse optical flow based SfM approach. That allows decreasing the estimation ambiguity by forcing some constraints on camera motion. Finally, the authors apply a fusion scheme to create unique 3D structure estimation.

Keywords