IET Image Processing (Apr 2023)
Multi‐scale homography estimation based on dual feature aggregation transformer
Abstract
Abstract The accuracy of registration in image stitching task directly affects the performance of subsequent stages. Traditional registration methods rely heavily on the quality of the features when calculating the homography matrix, resulting in alignment failures in low‐texture or low‐overlap scenes due to extracting insufficient features. On the other hand, existing DNN‐based methods for homography estimation are more robust in multiple scenes but previous work usually employs an overly simple convolutional network structure to directly regress the homography, ignoring the redundant information contained in the feature maps so that their prediction accuracy is inferior to the traditional methods in simple scenarios. To overcome the disadvantages of the two methods, a Multi‐scale structure is proposed to extract feature maps at three scales and design two modules to handle the matrix prediction respectively. The DFA‐T module analyzes semantic information on the high‐level features to accomplish coarse‐grained alignment while the Contextual Correlation module on the bottom level to accomplish more accurate alignment. Experiments demonstrate that this method provides more accurate alignment results than the existing state‐of‐the‐art DNN‐based methods and outperforms traditional algorithms with more stable results in some extreme scenarios.
Keywords