A Cross-Modal Hash Retrieval Method with Fused Triples

Wenxiao Li; Hongyan Mei; Yutian Li; Jiayao Yu; Xing Zhang; Xiaorong Xue; Jiahao Wang

doi:10.3390/app131810524

Applied Sciences (Sep 2023)

A Cross-Modal Hash Retrieval Method with Fused Triples

Wenxiao Li,
Hongyan Mei,
Yutian Li,
Jiayao Yu,
Xing Zhang,
Xiaorong Xue,
Jiahao Wang

Affiliations

Wenxiao Li: College of Electronic and Information Engineering, Liaoning University of Technology, Jinzhou 121001, China
Hongyan Mei: College of Electronic and Information Engineering, Liaoning University of Technology, Jinzhou 121001, China
Yutian Li: College of Electronic and Information Engineering, Liaoning University of Technology, Jinzhou 121001, China
Jiayao Yu: College of Electronic and Information Engineering, Liaoning University of Technology, Jinzhou 121001, China
Xing Zhang: College of Electronic and Information Engineering, Liaoning University of Technology, Jinzhou 121001, China
Xiaorong Xue: College of Electronic and Information Engineering, Liaoning University of Technology, Jinzhou 121001, China
Jiahao Wang: College of Electronic and Information Engineering, Liaoning University of Technology, Jinzhou 121001, China

DOI: https://doi.org/10.3390/app131810524
Journal volume & issue: Vol. 13, no. 18
p. 10524

Abstract

Read online

Due to the fast retrieval speed and low storage cost, cross-modal hashing has become the primary method for cross-modal retrieval. Since the emergence of deep cross-modal hashing methods, cross-modal retrieval significantly improved. However, the existing cross-modal hash retrieval methods still need to effectively utilize the dataset’s supervisory information and the lack of similarity expression ability. This means that the label information needs to be maximized, and the potential semantic relationship between two modalities cannot be fully explored, thus affecting the judgment of semantic similarity between two modalities. To address these problems, this paper proposes Tri-CMH, a cross-modal hash retrieval method with fused triples, which is an end-to-end modeling framework consisting of two parts: feature extraction and hash learning. Firstly, the multi-modal data are preprocessing into the form of triple groups. The data supervision matrix is constructed so that the samples with labels and their meanings are aggregated together. In contrast, the samples with labels and their opposite meanings are separated, thus avoiding the problem of the under-utilization of supervisory information in the data set and achieving the effect of efficiently utilizing the global supervisory information. Meanwhile, the loss function of the hash learning part is optimized by considering the Hamming distance loss, single-modality internal loss, cross-modality loss, and quantization loss to explicitly constrain semantically similar hash codes and semantically dissimilar hash codes and to improve the model’s ability to judge cross-modality semantic similarity. The method is trained and tested on the IAPR-TC12, MIRFLICKR-25K, and NUS-WIDE datasets, and the experimental evaluation criteria are mAP and PR curve, and the experimental results show the effectiveness and practicality of the method.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords