IEEE Access (Jan 2024)

Deep Hashing Similarity Learning for Cross-Modal Retrieval

  • Ying Ma,
  • Meng Wang,
  • Guangyun Lu,
  • Yajun Sun

DOI
https://doi.org/10.1109/ACCESS.2024.3352434
Journal volume & issue
Vol. 12
pp. 8609 – 8618

Abstract

Read online

In the realm of cross-modal retrieval research, hash methods have garnered significant attention from scholars due to their high retrieval efficiency and low storage costs. However, these methods often sacrifice a considerable amount of semantic features when mapping multi-modal characteristics to a low-dimensional space. Moreover, the focus of hash learning has primarily been on inter-modal similarity learning, neglecting the importance of intra-modal similarity learning. To address these issues, this paper proposes a novel cross-modal hash method called Deep Hashing Similarity Learning for Cross-modal Retrieval (DHSL). DHSL incorporates relation networks into the hash method, enabling pairwise matching between images and texts. This approach effectively bridges the heterogeneity gap between images and texts while simultaneously emphasizing the intra-modal similarity information within both modalities. The result is a hash similarity matrix that captures both inter-modal similarity and intra-modal discriminability. Considering that the process of transforming high-dimensional features into hash codes often leads to a loss of important semantic information, we introduce a feature selector to enhance the features. This selector filters out distinctive features from the original feature set and combines them with low-dimensional features to complement the semantic information. Moreover, we introduce weighted cosine triplet loss and quantization loss to constrain the hash representation in the Hamming space, thereby learning high-quality hash codes. Comprehensive experimental results on two benchmark datasets, NUS-WIDE and MIRFlickr25K, demonstrate that DHSL outperforms the state-of-the-art cross-modal hash methods.

Keywords