IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2023)
Hypergraph-Enhanced Textual-Visual Matching Network for Cross-Modal Remote Sensing Image Retrieval via Dynamic Hypergraph Learning
Abstract
Cross-modal remote sensing (RS) image retrieval aims to retrieve RS images using other modalities (e.g., text) and vice versa. The relationship between objects in the RS image is complex, i.e., the distribution of multiple types of objects is uneven, which makes the matching with query text inaccurate, and then restricts the performance of remote sensing image retrieval. Previous methods generally focus on the feature matching between RS image and text and rarely model the relationships between features of RS image. Hypergraph (hyperedge connecting multiple vertices) is an extended structure of a regular graph and has attracted extensive attention for its superiority in representing high-order relationships. Inspired by the advantages of the hypergraph, in this work, a hypergraph-enhanced textual-visual matching network (HyperMatch) is proposed to circumvent the inaccurate matching between the RS image and query text. Specifically, a multiscale RS image hypergraph network is designed to model the complex relationships between features of the RS image for forming the valuable and redundant features into different hyperedges. In addition, a hypergraph construction and update method for an RS image is designed. For constructing a hypergraph, the features of an RS image running as vertices and cosine similarity is the metric to measure the correlation between them. Vertex and hyperedge attention mechanisms are introduced for the dynamic update of a hypergraph to realize the alternating update of vertices and hyperedges. Quantitative and qualitative experiments on the RSICD and RSITMD datasets verify the effectiveness of the proposed method in cross-modal remote sensing image retrieval.
Keywords