Generating Image Captions Using Bahdanau Attention Mechanism and Transfer Learning

Shahnawaz Ayoub; Yonis Gulzar; Faheem Ahmad Reegu; Sherzod Turaev

doi:10.3390/sym14122681

Symmetry (Dec 2022)

Generating Image Captions Using Bahdanau Attention Mechanism and Transfer Learning

Shahnawaz Ayoub,
Yonis Gulzar,
Faheem Ahmad Reegu,
Sherzod Turaev

Affiliations

Shahnawaz Ayoub: Department of Computer Science and Engineering, Shri Venkateshwara University, NH-24, Venkateshwara Nagar, Gajraula 244236, Uttar Pradesh, India
Yonis Gulzar: Department of Management Information Systems, College of Business Administration, King Faisal University, Al-Ahsa 31982, Saudi Arabia
Faheem Ahmad Reegu: Department of Computer Science and Information Technology, Jazan University, Jazan 45142, Saudi Arabia
Sherzod Turaev: Department of Computer Science & Software Engineering, College of Information Technology, United Arab Emirates University, Al Ain 15551, United Arab Emirates

DOI: https://doi.org/10.3390/sym14122681
Journal volume & issue: Vol. 14, no. 12
p. 2681

Abstract

Read online

Automatic image caption prediction is a challenging task in natural language processing. Most of the researchers have used the convolutional neural network as an encoder and decoder. However, an accurate image caption prediction requires a model to understand the semantic relationship that exists between the various objects present in an image. The attention mechanism performs a linear combination of encoder and decoder states. It emphasizes the semantic information present in the caption with the visual information present in an image. In this paper, we incorporated the Bahdanau attention mechanism with two pre-trained convolutional neural networks—Vector Geometry Group and InceptionV3—to predict the captions of a given image. The two pre-trained models are used as encoders and the Recurrent neural network is used as a decoder. With the help of the attention mechanism, the two encoders are able to provide semantic context information to the decoder and achieve a bilingual evaluation understudy score of 62.5. Our main goal is to compare the performance of the two pre-trained models incorporated with the Bahdanau attention mechanism on the same dataset.

Published in Symmetry

ISSN: 2073-8994 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/symmetry/

About the journal

Abstract

Keywords