Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning

HAN Hong-qi, RAN Ya-xin, ZHANG Yun-liang, GUI Jie, GAO Xiong, YI Meng-lin

doi:10.11896/jsjkx.210200157

Jisuanji kexue (May 2022)

Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning

HAN Hong-qi, RAN Ya-xin, ZHANG Yun-liang, GUI Jie, GAO Xiong, YI Meng-lin

Affiliations

HAN Hong-qi, RAN Ya-xin, ZHANG Yun-liang, GUI Jie, GAO Xiong, YI Meng-lin: 1 Institute of Scientific and Technical Information of China,Beijing 100038,China ;2 Key Laboratory of Rich-media Knowledge Organization and Service of Digital Publishing Content,National Press and Publication Administration,Beijing 100038,China

DOI: https://doi.org/10.11896/jsjkx.210200157
Journal volume & issue: Vol. 49, no. 5
pp. 33 – 42

Abstract

Read online

The semantic similarity between two different media data can not be calculated directly because of the serious heterogeneous gap and semantic gap between them,which affects the implementation and effect of cross media retrieval.Although the common space learning can achieve cross media semantic association and retrieval,the retrieval performance is not satisfied.The main reason is that it uses common feature extraction technology and general classification algorithm to implement semantic correlation and match.Aiming at this problem,the study proposes a novel cross media correlation method called Stacking-DSCM-WR for cross media retrieval between documents and images.WR means that text feature extraction is based on word-embedding technique and the image feature extraction is based on ResNet technique.DSCM means that the deep semantic correlation and match technology is exploited to project data of different modalities into a common subspace.Stacking is a kind of ensemble lear-ning algorithm.It is employed to produce the distribution of text documents and images on the same high-level conceptual semantic space for cross-media retrieval.The experiments are carried out on two smaller cross-media datasets,Wikipedia and Pascal Sentence,and one larger cross-media dataset,INRIA-Websearch,respectively.The results show that the proposed method can effectively extract the features of text and image,and realize the correlation and match of cross media data in high-level semantic space.The comparisons with similar cross media retrieval methods show that the proposed method achieves the best retrieval effect based on MAP metric.

cross-media information retrieval|semantic correlation|ensemble learning|word embedding|residual networks

Published in Jisuanji kexue

ISSN: 1002-137X (Print)
Publisher: Editorial office of Computer Science
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software; Technology: Technology (General)
Website: http://www.jsjkx.com/CN/1002-137X/home.shtml

About the journal

Abstract

Keywords