Bi-directional Image–Text Matching Deep Learning-Based Approaches: Concepts, Methodologies, Benchmarks and Challenges

Doaa B. Ebaid; Magda M. Madbouly; Adel A. El-Zoghabi

doi:10.1007/s44196-023-00260-3

International Journal of Computational Intelligence Systems (May 2023)

Bi-directional Image–Text Matching Deep Learning-Based Approaches: Concepts, Methodologies, Benchmarks and Challenges

Doaa B. Ebaid,
Magda M. Madbouly,
Adel A. El-Zoghabi

Affiliations

Doaa B. Ebaid: Department of Information Technology, Institute of Graduate Studies and Research, Alexandria University
Magda M. Madbouly: Department of Information Technology, Institute of Graduate Studies and Research, Alexandria University
Adel A. El-Zoghabi: Department of Information Technology, Institute of Graduate Studies and Research, Alexandria University

DOI: https://doi.org/10.1007/s44196-023-00260-3
Journal volume & issue: Vol. 16, no. 1
pp. 1 – 22

Abstract

Read online

Abstract Nowadays, image–text matching (retrieval) has frequently attracted attention due to the growth of multimodal data. This task returns the relevant images to a textual query or descriptions that describe a visual scene and vice versa. The core challenge is how to precisely determine the similarity computation between the text and image, which requires understanding the different modalities by extracting the related information accurately. Although many approaches are established for matching textual data and visual content utilizing deep learning (DL) approaches, a few reviews of the studies of image–text matching are obtainable using DL. In this review study, we contribute to present and clarify the modern techniques based on DL in the image–text matching problem by providing an extensive study of the existing matching models, different current architectures, benchmark datasets, and evaluation methods. First, we explain the matching task and illustrate frequently used architecture. Second, we classify present approaches according to two important concepts the alignment between image and text, and the learning approach. Third, we report standard datasets and evaluation techniques. Finally, we show up current challenges to serve as an inspiration to new researchers in this field.

Published in International Journal of Computational Intelligence Systems

ISSN: 1875-6891 (Print); 1875-6883 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.springer.com/journal/44196

About the journal

Abstract

Keywords