IEEE Access (Jan 2022)
Multimodal Fusion of Deeply Inferred Point Clouds for 3D Scene Reconstruction Using Cross-Entropy ICP
Abstract
Depth estimation is a crucial step toward 3D scene understanding. Most traditional systems rely on direct sensing of this information by means of photogrammetry or on stereo imaging. As the scenes getting more complex, these modalities were impeded by, for instances, occlusion and imperfect lighting condition, etc. As a consequence, reconstructed surfaces are normally left with voids, due to missing data. Therefore, surface regularization is often required as post-processing. With the recent advances in deep learning, depth inference from a monocular image has attracted considerable interests. Many convolutional architectures have been proposed to infer depth information from a monocular image, with promising results. Thus far, visual cues learned and generalized by these networks may be ambiguous, resulting in inaccurate estimation. To address these issues, this paper presents an effective method for fusing point clouds extracted from depth values, directly measured by an infrared camera and estimated by a modified ResNet-50 from an RGB image, of the same scene. To ensure robustness and efficiency of finding the correspondence between and aligning these point clouds, an information theoretic alignment strategy, called CEICP, was proposed. The experimental results on a public dataset demonstrated that the proposed method outperformed its counterparts, while producing good quality surface renditions of the underlying scene.
Keywords