Stereo matching from monocular images using feature consistency

Zhongjian Lu; An Chen; Hongxia Gao; Langwen Zhang; Congyu Zhang; Yang Yang

doi:10.1049/ipr2.13114

IET Image Processing (Aug 2024)

Stereo matching from monocular images using feature consistency

Zhongjian Lu,
An Chen,
Hongxia Gao,
Langwen Zhang,
Congyu Zhang,
Yang Yang

Affiliations

Zhongjian Lu: School of Automation Science and Engineering South China University of Technology Guangzhou China
An Chen: School of Automation Science and Engineering South China University of Technology Guangzhou China
Hongxia Gao: School of Automation Science and Engineering South China University of Technology Guangzhou China
Langwen Zhang: School of Automation Science and Engineering South China University of Technology Guangzhou China
Congyu Zhang: School of Automation Science and Engineering South China University of Technology Guangzhou China
Yang Yang: School of Automation Science and Engineering South China University of Technology Guangzhou China

DOI: https://doi.org/10.1049/ipr2.13114
Journal volume & issue: Vol. 18, no. 10
pp. 2540 – 2552

Abstract

Read online

Abstract Synthetic images facilitate stereo matching. However, synthetic images may suffer from image distortion, domain bias, and stereo mismatch, which would significantly restrict the widespread use of stereo matching models in the real world. The first goal in this paper is to synthesize real‐looking images for minimizing the domain bias between the synthesized and real images. For this purpose, sharpened disparity maps are produced from a mono real image. Then, stereo image pairs are synthesized using these imperfect disparity maps and the single real image in the proposed pipeline. Although the synthesized images are as realistic as possible, the domain styles of the synthesized images are always very different from the real images. Thus, the second goal is to enhance the domain generalization ability of the stereo matching network. For that, the feature extraction layer is replaced with a teacher–student model. Then, a constraint of binocular contrast features is imposed on the output of the model. When tested on the KITTI, ETH3D, and Middlebury datasets, the accuracy of the method outperforms traditional methods by at least 30%. Experiments demonstrate that the approaches are general and can be conveniently embedded into existing stereo networks.

Published in IET Image Processing

ISSN: 1751-9659 (Print); 1751-9667 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Photography; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519667

About the journal

Abstract

Keywords