Enhancing Cross-Modal Retrieval Based on Modality-Specific and Embedding Spaces

Rintaro Yanagi; Ren Togo; Takahiro Ogawa; Miki Haseyama

doi:10.1109/ACCESS.2020.2995815

IEEE Access (Jan 2020)

Enhancing Cross-Modal Retrieval Based on Modality-Specific and Embedding Spaces

Rintaro Yanagi,
Ren Togo,
Takahiro Ogawa,
Miki Haseyama

Affiliations

Rintaro Yanagi: ORCiD; Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan
Ren Togo: ORCiD; Education and Research Center for Mathematical and Data Science, Hokkaido University, Sapporo, Japan
Takahiro Ogawa: ORCiD; Faculty of Information Science and Technology, Division of Media and Network Technologies, Hokkaido University, Sapporo, Japan
Miki Haseyama: ORCiD; Faculty of Information Science and Technology, Division of Media and Network Technologies, Hokkaido University, Sapporo, Japan

DOI: https://doi.org/10.1109/ACCESS.2020.2995815
Journal volume & issue: Vol. 8
pp. 96777 – 96786

Abstract

Read online

A new approach that drastically improves cross-modal retrieval performance in vision and language (hereinafter referred to as “vision and language retrieval”) is proposed in this paper. Vision and language retrieval takes data of one modality as a query to retrieve relevant data of another modality, and it enables flexible retrieval across different modalities. Most of the existing methods learn optimal embeddings of visual and lingual information to a single common representation space. However, we argue that the forced embedding optimization results in loss of key information for sentences and images. In this paper, we propose an effective utilization of representation spaces in a simple but robust vision and language retrieval method. The proposed method makes use of multiple individual representation spaces through text-to-image and image-to-text models. Experimental results showed that the proposed approach enhances the performance of existing methods that embed visual and lingual information to a single common representation space.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords