Image caption generation using Visual Attention Prediction and Contextual Spatial Relation Extraction

Reshmi Sasibhooshan; Suresh Kumaraswamy; Santhoshkumar Sasidharan

doi:10.1186/s40537-023-00693-9

Journal of Big Data (Feb 2023)

Image caption generation using Visual Attention Prediction and Contextual Spatial Relation Extraction

Reshmi Sasibhooshan,
Suresh Kumaraswamy,
Santhoshkumar Sasidharan

Affiliations

Reshmi Sasibhooshan: Department of Electronics and Communication Engineering, College of Engineering Trivandrum
Suresh Kumaraswamy: Department of Electronics and Communication Engineering, Government Engineering College, Mananthavady
Santhoshkumar Sasidharan: Department of Electronics and Communication Engineering, Government Engineering College, Painav

DOI: https://doi.org/10.1186/s40537-023-00693-9
Journal volume & issue: Vol. 10, no. 1
pp. 1 – 18

Abstract

Read online

Abstract Automatic caption generation with attention mechanisms aims at generating more descriptive captions containing coarser to finer semantic contents in the image. In this work, we use an encoder-decoder framework employing Wavelet transform based Convolutional Neural Network (WCNN) with two level discrete wavelet decomposition for extracting the visual feature maps highlighting the spatial, spectral and semantic details from the image. The Visual Attention Prediction Network (VAPN) computes both channel and spatial attention for obtaining visually attentive features. In addition to these, local features are also taken into account by considering the contextual spatial relationship between the different objects. The probability of the appropriate word prediction is achieved by combining the aforementioned architecture with Long Short Term Memory (LSTM) decoder network. Experiments are conducted on three benchmark datasets—Flickr8K, Flickr30K and MSCOCO datasets and the evaluation results prove the improved performance of the proposed model with CIDEr score of 124.2.

Published in Journal of Big Data

ISSN: 2196-1115 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journalofbigdata.springeropen.com

About the journal

Abstract

Keywords