Hypergraph-Enhanced Textual-Visual Matching Network for Cross-Modal Remote Sensing Image Retrieval via Dynamic Hypergraph Learning

Fanglong Yao; Xian Sun; Nayu Liu; Changyuan Tian; Liangyu Xu; Leiyi Hu; Chibiao Ding

doi:10.1109/JSTARS.2022.3226325

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2023)

Hypergraph-Enhanced Textual-Visual Matching Network for Cross-Modal Remote Sensing Image Retrieval via Dynamic Hypergraph Learning

Fanglong Yao,
Xian Sun,
Nayu Liu,
Changyuan Tian,
Liangyu Xu,
Leiyi Hu,
Chibiao Ding

Affiliations

Fanglong Yao: ORCiD; Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China
Xian Sun: ORCiD; Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China
Nayu Liu: ORCiD; Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China
Changyuan Tian: Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China
Liangyu Xu: Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China
Leiyi Hu: Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China
Chibiao Ding: ORCiD; Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China

DOI: https://doi.org/10.1109/JSTARS.2022.3226325
Journal volume & issue: Vol. 16
pp. 688 – 701

Abstract

Read online

Cross-modal remote sensing (RS) image retrieval aims to retrieve RS images using other modalities (e.g., text) and vice versa. The relationship between objects in the RS image is complex, i.e., the distribution of multiple types of objects is uneven, which makes the matching with query text inaccurate, and then restricts the performance of remote sensing image retrieval. Previous methods generally focus on the feature matching between RS image and text and rarely model the relationships between features of RS image. Hypergraph (hyperedge connecting multiple vertices) is an extended structure of a regular graph and has attracted extensive attention for its superiority in representing high-order relationships. Inspired by the advantages of the hypergraph, in this work, a hypergraph-enhanced textual-visual matching network (HyperMatch) is proposed to circumvent the inaccurate matching between the RS image and query text. Specifically, a multiscale RS image hypergraph network is designed to model the complex relationships between features of the RS image for forming the valuable and redundant features into different hyperedges. In addition, a hypergraph construction and update method for an RS image is designed. For constructing a hypergraph, the features of an RS image running as vertices and cosine similarity is the metric to measure the correlation between them. Vertex and hyperedge attention mechanisms are introduced for the dynamic update of a hypergraph to realize the alternating update of vertices and hyperedges. Quantitative and qualitative experiments on the RSICD and RSITMD datasets verify the effectiveness of the proposed method in cross-modal remote sensing image retrieval.

Published in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ISSN: 1939-1404 (Print); 2151-1535 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Ocean engineering; Science: Physics: Geophysics. Cosmic physics
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4609443

About the journal

Abstract

Keywords