Journal of King Saud University: Computer and Information Sciences (Apr 2025)

Hierarchical grid-constrained fusion network for image stitching

  • Yongqin Zhang,
  • Baojie Ruan,
  • Linge Du,
  • Liangjiang Li,
  • Zhan Li,
  • Xiaofeng Wang,
  • Meng Wu,
  • Jinsheng Xiao

DOI
https://doi.org/10.1007/s44443-025-00005-6
Journal volume & issue
Vol. 37, no. 3
pp. 1 – 24

Abstract

Read online

Abstract The digitization of ancient murals is crucial in preserving, inheriting, and utilizing cultural heritage. Due to the vast coverage of murals, digitization typically involves capturing images in segments and then stitching them together. However, existing image stitching techniques face limitations regarding efficiency, generalization capabilities, and robustness, which hinder their practical applicability. To solve these problems and improve the performance of image stitching, this paper proposes an unsupervised hierarchical grid-constrained fusion network model. This model consists of two main modules: image alignment and image synthesis. The image alignment module incorporates prior knowledge, such as feature pyramids, attention mechanisms, and context dependencies, to facilitate feature extraction. It also includes a multi-scale grid homography generation part that utilizes region masks to create a deformation field. Additionally, a stitching-domain transformation unit is employed to ensure spatial consistency by deforming the input reference and target images. In the image synthesis module, a progressive inference fusion network framework is proposed to simplify the complex image fusion problem into a multi-granularity image synthesis problem. This framework utilizes two encoder-decoder cascaded network units to merge information progressively from coarse to fine granularity, producing high-resolution stitched images. Experimental results demonstrate that the proposed model exhibits superior robustness on both public and custom image datasets, and generally outperforms state-of-the-art image stitching methods, especially in subjective visual assessments.

Keywords