IEEE Access (Jan 2018)

Learning Cross-Modal Aligned Representation With Graph Embedding

  • Youcai Zhang,
  • Jiayan Cao,
  • Xiaodong Gu

DOI
https://doi.org/10.1109/ACCESS.2018.2881997
Journal volume & issue
Vol. 6
pp. 77321 – 77333

Abstract

Read online

The main task of cross-modal analysis is to learn discriminative representation shared across different modalities. In order to pursue aligned representation, conventional approaches tend to construct and optimize a linear projection or train a complex architecture of deep layers, yet it is difficult to compromise between accuracy and efficiency on modeling multimodal data. This paper proposes a novel graph-embedding learning framework implemented by neural networks. The learned embedding directly approximates the cross-modal aligned representation to perform cross-modal retrieval and image classification combining text information. Proposed framework extracts learned representation from a graph model and, simultaneously, trains a classifier under semi-supervised settings. For optimization, unlike previous methods based on the graph Laplacian regularization, a sampling strategy is adopted to generate training pairs to fully explore the inter-modal and intra-modal similarity relationship. Experimental results on various datasets show that the proposed framework outperforms other state-of-the-art methods on crossmodal retrieval. The framework also demonstrates convincing improvements on the new issue of image classification combining text information on Wiki dataset.

Keywords