Non-Co-Occurrence Enhanced Multi-Label Cross-Modal Hashing Retrieval Based on Graph Convolutional Network

Mingyong Li; Jiabao Fan; Ziyong Lin

doi:10.1109/access.2023.3245074

IEEE Access (Jan 2023)

Non-Co-Occurrence Enhanced Multi-Label Cross-Modal Hashing Retrieval Based on Graph Convolutional Network

Mingyong Li,
Jiabao Fan,
Ziyong Lin

Affiliations

Mingyong Li: ORCiD; School of Computer Technology and Information Science, Chongqing Normal University, Chongqing, China
Jiabao Fan: ORCiD; School of Computer Technology and Information Science, Chongqing Normal University, Chongqing, China
Ziyong Lin: School of Computer Technology and Information Science, Chongqing Normal University, Chongqing, China

DOI: https://doi.org/10.1109/access.2023.3245074
Journal volume & issue: Vol. 11
pp. 16310 – 16322

Abstract

Read online

Supervised cross-modal retrieval has significant advantages in retrieval efficiency and storage cost. In the field of hashing retrieval, existing supervised methods are divided into single-label and multi-label methods. For the single-label method, simply using a single label to measure the semantic relevance between instances will cause an error in supervision information. However, the existing multi-label hashing methods also have some problems. For example, only considering the co-occurrence of multiple labels among instances may not accurately reflect their similarity. At the same time, in the previous methods, the text modality processing did not reach the fine level of image modality, resulting in insufficient use of text information. To address these issues, we proposed Non-co-occurrence enhanced Multi-label cross-modal hashing retrieval based on Graph Convolutional Network (MHGCN). Firstly, we introduced a multi-label non-co-occurrence similarity measurement method, which adds multi-label non-co-occurrence information among instances in the multi-label similarity measurement to measure the differences between instances; Secondly, we used Graph Convolutional Networks (GCNS) to process the information on text modality; Thirdly, we introduced the memory mechanism to restrict the difference of hash code learning. Many experiments show that the proposed method has excellent performance. In three widely used datasets (NUS-WIDE, MIRFlickr-25k, IAPR TC-12), MAP performance in image-text and text-image tasks was significanlty improved by about 8%, 9%, and 7%, respectlively.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords