Cross‐modal semantic correlation learning by Bi‐CNN network

Chaoyi Wang; Liang Li; Chenggang Yan; Zhan Wang; Yaoqi Sun; Jiyong Zhang

doi:10.1049/ipr2.12176

IET Image Processing (Dec 2021)

Cross‐modal semantic correlation learning by Bi‐CNN network

Chaoyi Wang,
Liang Li,
Chenggang Yan,
Zhan Wang,
Yaoqi Sun,
Jiyong Zhang

Affiliations

Chaoyi Wang: Hangzhou Dianzi University Hangzhou China
Liang Li: Institute of computing technology, CAS Beijing China
Chenggang Yan: Hangzhou Dianzi University Hangzhou China
Zhan Wang: RTInvent Technology Co., Ltd Beijing China
Yaoqi Sun: Hangzhou Dianzi University Hangzhou China
Jiyong Zhang: Hangzhou Dianzi University Hangzhou China

DOI: https://doi.org/10.1049/ipr2.12176
Journal volume & issue: Vol. 15, no. 14
pp. 3674 – 3684

Abstract

Read online

Abstract Cross modal retrieval can retrieve images through a text query and vice versa. In recent years, cross modal retrieval has attracted extensive attention. The purpose of most now available cross modal retrieval methods is to find a common subspace and maximize the different modal correlation. To generate specific representations consistent with cross modal tasks, this paper proposes a novel cross modal retrieval framework, which integrates feature learning and latent space embedding. In detail, we proposed a deep CNN and a shallow CNN to extract the feature of the samples. The deep CNN is used to extract the representation of images, and the shallow CNN uses a multi‐dimensional kernel to extract multi‐level semantic representation of text. Meanwhile, we enhance the semantic manifold by constructing cross modal ranking and within‐modal discriminant loss to improve the division of semantic representation. Moreover, the most representative samples are selected by using online sampling strategy, so that the approach can be implemented on a large‐scale data. This approach not only increases the discriminative ability among different categories, but also maximizes the relativity between different modalities. Experiments on three real word datasets show that the proposed method is superior to the popular methods.

Published in IET Image Processing

ISSN: 1751-9659 (Print); 1751-9667 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Photography; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519667

About the journal

Abstract

Keywords