IEEE Access (Jan 2023)

The State of the Art for Cross-Modal Retrieval: A Survey

  • Kun Zhou,
  • Fadratul Hafinaz Hassan,
  • Gan Keng Hoon

DOI
https://doi.org/10.1109/ACCESS.2023.3338548
Journal volume & issue
Vol. 11
pp. 138568 – 138589

Abstract

Read online

Cross-modal retrieval, which aims to search for semantically relevant data across different modalities, has received increasing attention in recent years. Deep learning, with its ability to extract high-level representations from multimodal data, has become a popular approach for cross-modal retrieval. In this paper, we present a comprehensive survey of deep learning techniques for cross-modal retrieval including 37 papers published in recent years. The review is organized into four main sections, covering traditional subspace learning methods, deep learning, and machine learning-based approaches, techniques based on large multi-modal models, and an analysis of datasets used in the field of cross-modal retrieval. We compare and analyze the performance of different deep learning methods on benchmark datasets, the result shows that although a large number of innovative methods have been proposed, there are still some problems that need to be solved, such as multi-modal feature alignment, multi-modal feature fusion, and subspace learning, as well as specialized datasets.

Keywords