IEEE Access (Jan 2023)
Self-Supervised Cluster-Contrast Distillation Hashing Network for Cross-Modal Retrieval
Abstract
Traditional cross-modal hash models enable efficient and fast retrieval between multimodal data by training high-quality hash representations. The key to the cross-modal hashing model is feature extraction. However, the quality of the features largely depends on the semantic similarity between the multi-modal data, and the existing methods do not effectively utilize the semantic information between the data. In this paper, we attempt to explore the semantic information inherent within the data using contrastive learning. Specifically, we propose a end-to-end cluster-level contrastive learning method (SCCDH) for cross-modal hashing. The method utilizes the clustering results to guide feature learning in an appropriately designed contrast framework. In SCCDH, feature-level and hash cluster-level contrastive learning are used to help the model learn discriminative features among multimodal data. In addition, we propose a distillation filtering method to filter out a large amount of noise in the data. Extensive experiments were conducted on the MIRFLICKR-25K, NUS-WIDE, and MS-COCO datasets, and the results demonstrate that the proposed method outperformed several state-of-the-art methods.
Keywords