Jisuanji kexue (Sep 2021)

Cross-modal Retrieval Combining Deep Canonical Correlation Analysis and Adversarial Learning

  • LIU Li-bo, GOU Ting-ting

DOI
https://doi.org/10.11896/jsjkx.200600119
Journal volume & issue
Vol. 48, no. 9
pp. 200 – 207

Abstract

Read online

This paper proposes a cross-modal retrieval method (DCCA-ACMR) that integrates deep canonical correlation analysis and adversarial learning.The method can improve the utilization rate of unlabeled samples,learn more powerful feature projection models,and improve the accuracy of cross-modal retrieval.Specifically,under the DCGAN framework:1)depth canonical correlation analysis constraints are added between the two single-modal representation layers of image and text,to construct a graphic feature projection model,and the semantic relevance of sample pairs is exploited fully;2)the graphic feature projection model is used as a generator,and the modal feature classification model is used as a discriminator to form a graphic and text cross-modal retrieval model;3)the common subspace representation of samples is learned by using labeled samples and unlabeled samples in the confrontation between generator and discriminator.We utilize average accuracy rate (mAP) to evaluate the proposed method on the two public datsets,Wikipedia and NUSWIDE-10k.The average mAP values of image-to-text retrievaland text-image retrie-val are 0.556 and 0.563 respectively on the two datasets.Experimental results show that DCCA-ACMR method is superior to the existing representative methods.

Keywords