Cross-Modal Retrieval: A Review of Methodologies, Datasets, and Future Perspectives

Zhichao Han; Azreen Bin Azman; Mas Rina Binti Mustaffa; Fatimah Binti Khalid

doi:10.1109/ACCESS.2024.3444817

IEEE Access (Jan 2024)

Cross-Modal Retrieval: A Review of Methodologies, Datasets, and Future Perspectives

Zhichao Han,
Azreen Bin Azman,
Mas Rina Binti Mustaffa,
Fatimah Binti Khalid

Affiliations

Zhichao Han: ORCiD; Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Serdang, Malaysia
Azreen Bin Azman: ORCiD; Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Serdang, Malaysia
Mas Rina Binti Mustaffa: ORCiD; Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Serdang, Malaysia
Fatimah Binti Khalid: ORCiD; Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Serdang, Malaysia

DOI: https://doi.org/10.1109/ACCESS.2024.3444817
Journal volume & issue: Vol. 12
pp. 115716 – 115741

Abstract

Read online

With the rapid development of science and technology, all types of mixed media contain large amounts of data. Traditional single multimedia data can no longer satisfy daily requirements. Therefore, the cross-modal retrieval technology has become an urgent requirement. Consequently, there is a pressing need for cross-modal retrieval technology. Its purpose is to mine the connection between different modal samples, that is, to retrieve another modal sample with approximate semantics through one modal sample. For example, users can retrieve multimedia data such as images or videos with text. However, there are differences in the modal representation of different types of multimedia data, and measuring the correlation between different modes is the main problem of cross-modal retrieval. Currently, the most popular deep learning methods have achieved remarkable results in the field of data processing and graphics. Many researchers have applied deep learning methods to cross-modal retrieval to solve the problem of similarity measurement between different multimedia data. By summarizing the relevant paper methods of cross-modal retrieval, this paper provides a definition of cross-modal retrieval problems, reviews the core ideas of the current mainstream cross-modal retrieval methods in the form of three main methods, lists the commonly used data sets and evaluation methods, and finally analyzes the problems and future research trends of cross-modal retrieval.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords