Text-Enhanced Graph Attention Hashing for Cross-Modal Retrieval

Qiang Zou; Shuli Cheng; Anyu Du; Jiayi Chen

doi:10.3390/e26110911

Entropy (Oct 2024)

Text-Enhanced Graph Attention Hashing for Cross-Modal Retrieval

Qiang Zou,
Shuli Cheng,
Anyu Du,
Jiayi Chen

Affiliations

Qiang Zou: College of Computer Science and Technology, Xinjiang University, Urumqi 830046, China
Shuli Cheng: College of Computer Science and Technology, Xinjiang University, Urumqi 830046, China
Anyu Du: College of Computer Science and Technology, Xinjiang University, Urumqi 830046, China
Jiayi Chen: College of Computer Science and Technology, Xinjiang University, Urumqi 830046, China

DOI: https://doi.org/10.3390/e26110911
Journal volume & issue: Vol. 26, no. 11
p. 911

Abstract

Read online

Deep hashing technology, known for its low-cost storage and rapid retrieval, has become a focal point in cross-modal retrieval research as multimodal data continue to grow. However, existing supervised methods often overlook noisy labels and multiscale features in different modal datasets, leading to higher information entropy in the generated hash codes and features, which reduces retrieval performance. The variation in text annotation information across datasets further increases the information entropy during text feature extraction, resulting in suboptimal outcomes. Consequently, reducing the information entropy in text feature extraction, supplementing text feature information, and enhancing the retrieval efficiency of large-scale media data are critical challenges in cross-modal retrieval research. To tackle these, this paper introduces the Text-Enhanced Graph Attention Hashing for Cross-Modal Retrieval (TEGAH) framework. TEGAH incorporates a deep text feature extraction network and a multiscale label region fusion network to minimize information entropy and optimize feature extraction. Additionally, a Graph-Attention-based modal feature fusion network is designed to efficiently integrate multimodal information, enhance the affinity of the network for different modes, and retain more semantic information. Extensive experiments on three multilabel datasets demonstrate that the TEGAH framework significantly outperforms state-of-the-art cross-modal hashing methods.

Published in Entropy

ISSN: 1099-4300 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Astronomy: Astrophysics; Science: Physics
Website: http://www.mdpi.com/journal/entropy

About the journal

Abstract

Keywords