IEEE Access (Jan 2023)

Transformer-Based Discriminative and Strong Representation Deep Hashing for Cross-Modal Retrieval

  • Suqing Zhou,
  • Yu Han,
  • Ning Chen,
  • Siyu Huang,
  • Kostromitin Konstantin Igorevich,
  • Jia Luo,
  • Peiying Zhang

DOI
https://doi.org/10.1109/ACCESS.2023.3339581
Journal volume & issue
Vol. 11
pp. 140041 – 140055

Abstract

Read online

Cross-modal hashing retrieval has attracted extensive attention due to its low storage requirements as well as high retrieval efficiency. In particular, how to more fully exploit the correlation of different modality data and generate a more distinguished representation is the key to improving the performance of this method. Moreover, Transformer-based models have been widely used in various fields, including natural language processing, due to their powerful contextual information processing capabilities. Based on these motivations, we propose a Transformer-based Distinguishing Strong Representation Deep Hashing (TDSRDH). For text modality, since the sequential relations between words imply semantic relations that are not independent relations, we thoughtfully encode them using a transformer-based encoder to obtain a strong representation. In addition, we propose a triple-supervised loss based on the commonly used pairwise loss and quantization loss. The latter two ensure the learned features and hash-codes can preserve the similarity of the original data during the learning process. The former ensures that the distance between similar instances is closer and the distance between dissimilar instances is farther. So that TDSRDH can generate more discriminative representations while preserving the similarity between modalities. Finally, experiments on the three datasets MIRFLICKR-25K, IAPR TC-12, and NUS-WIDE demonstrated the superiority of TDSRDH over the other baselines. Moreover, the effectiveness of the proposed idea was demonstrated by ablation experiments.

Keywords