International Journal of Applied Earth Observations and Geoinformation (Dec 2023)
Coarse-to-fine matching via cross fusion of satellite images
Abstract
The registration of multimodal satellite images is essential for a prerequisite for accruing complementary observational data. Nevertheless, the differential imaging nuances amongst non-linear radiometric multimodal images precipitate a complexity in keypoint detection, rendering it a great challenge. This complexity exacerbates the difficulty encountered in matching multimodal satellite images. In this paper, a dual-branch cross fusion network (DF-Net) is proposed for the purpose of satellite image registration. DF-Net relies on the self-attention granted to a pair of images, thereby providing cross-modal fusion feature descriptions. Initially, reference and sensed images are deployed as inputs for the dual-branch network, which in turn engenders feature descriptions of both high and low resolution, respectively. Sequentially, the matching of individual feature descriptions is anchored on the low-resolution feature map, paving the way for the establishment of coarse matching correspondences. Subsequently, the outcomes of these coarse correspondences are transposed onto the feature map with a higher resolution, thereby generating fine matching results for each coarse correspondence. An exhaustive set of qualitative and quantitative assessments have been administered on three satellite image datasets encompassing a diverse range of scenarios. The average Repeatability (Rep.), Mean Matching Accuracy (MMA), and Root-Mean-Square Error (RMSE) of the DF-Net applied to three large-scale satellite images were recorded to be 0.71, 0.65, and 2.34, respectively. These findings buttress the proficiency of the proposed strategy in facilitating cross-modal matching and bear testimony to the sterling performance of the method proposed.