Learning accurate template matching with differentiable coarse-to-fine correspondence refinement

Zhirui Gao; Renjiao Yi; Zheng Qin; Yunfan Ye; Chenyang Zhu; Kai Xu

doi:10.1007/s41095-023-0333-9

Computational Visual Media (Jan 2024)

Learning accurate template matching with differentiable coarse-to-fine correspondence refinement

Zhirui Gao,
Renjiao Yi,
Zheng Qin,
Yunfan Ye,
Chenyang Zhu,
Kai Xu

Affiliations

Zhirui Gao: College of Computer, National University of Defense Technology
Renjiao Yi: College of Computer, National University of Defense Technology
Zheng Qin: College of Computer, National University of Defense Technology
Yunfan Ye: College of Computer, National University of Defense Technology
Chenyang Zhu: College of Computer, National University of Defense Technology
Kai Xu: College of Computer, National University of Defense Technology

DOI: https://doi.org/10.1007/s41095-023-0333-9
Journal volume & issue: Vol. 10, no. 2
pp. 309 – 330

Abstract

Read online

Abstract Template matching is a fundamental task in computer vision and has been studied for decades. It plays an essential role in manufacturing industry for estimating the poses of different parts, facilitating downstream tasks such as robotic grasping. Existing methods fail when the template and source images have different modalities, cluttered backgrounds, or weak textures. They also rarely consider geometric transformations via homographies, which commonly exist even for planar industrial parts. To tackle the challenges, we propose an accurate template matching method based on differentiable coarse-to-fine correspondence refinement. We use an edge-aware module to overcome the domain gap between the mask template and the grayscale image, allowing robust matching. An initial warp is estimated using coarse correspondences based on novel structure-aware information provided by transformers. This initial alignment is passed to a refinement network using references and aligned images to obtain sub-pixel level correspondences which are used to give the final geometric transformation. Extensive evaluation shows that our method to be significantly better than state-of-the-art methods and baselines, providing good generalization ability and visually plausible results even on unseen real data.

Published in Computational Visual Media

ISSN: 2096-0433 (Print); 2096-0662 (Online)
Publisher: SpringerOpen
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.springer.com/41095

About the journal

Abstract

Keywords