Attention-Based Multimodal Deep Learning on Vision-Language Data: Models, Datasets, Tasks, Evaluation Metrics and Applications

Priyankar Bose; Pratip Rana; Preetam Ghosh

doi:10.1109/ACCESS.2023.3299877

IEEE Access (Jan 2023)

Attention-Based Multimodal Deep Learning on Vision-Language Data: Models, Datasets, Tasks, Evaluation Metrics and Applications

Priyankar Bose,
Pratip Rana,
Preetam Ghosh

Affiliations

Priyankar Bose: ORCiD; Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
Pratip Rana: Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
Preetam Ghosh: ORCiD; Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA

DOI: https://doi.org/10.1109/ACCESS.2023.3299877
Journal volume & issue: Vol. 11
pp. 80624 – 80646

Abstract

Read online

Multimodal learning has gained immense popularity due to the explosive growth in the volume of image and textual data in various domains. Vision-language heterogeneous multimodal data has been utilized to solve a variety of tasks including classification, image segmentation, image captioning, question-answering, etc. Consequently, several attention mechanism-based approaches with deep learning have been proposed on image-text multimodal data. In this paper, we highlight the current status of attention-based deep learning approaches on vision-language multimodal data by presenting a detailed description of the existing models, their performances and the variety of evaluation metrics used therein. We revisited the various attention mechanisms on image-text multimodal data since its inception in 2015 till 2022 and considered a total of 75 articles for the survey. Our comprehensive discussion also encompasses the current tasks, datasets, application areas and future directions in this domain. This is the very first attempt to discuss the vast scope of attention-based deep learning mechanisms on image-text multimodal data.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords