Parallel Image Captioning Using 2D Masked Convolution

Chanrith Poleak; Jangwoo Kwon

doi:10.3390/app9091871

Applied Sciences (May 2019)

Parallel Image Captioning Using 2D Masked Convolution

Chanrith Poleak,
Jangwoo Kwon

Affiliations

Chanrith Poleak: Department of Computer Engineering, Inha University, Incheon 402-751, Korea
Jangwoo Kwon: Department of Computer Engineering, Inha University, Incheon 402-751, Korea

DOI: https://doi.org/10.3390/app9091871
Journal volume & issue: Vol. 9, no. 9
p. 1871

Abstract

Read online

Automatically generating a novel description of an image is a challenging and important problem that brings together advanced research in both computer vision and natural language processing. In recent years, image captioning has significantly improved its performance by using long short-term memory (LSTM) as a decoder for the language model. However, despite this improvement, LSTM itself has its own shortcomings as a model because the structure is complicated and its nature is inherently sequential. This paper proposes a model using a simple convolutional network for both encoder and decoder functions of image captioning, instead of the current state-of-the-art approach. Our experiment with this model on a Microsoft Common Objects in Context (MSCOCO) captioning dataset yielded results that are competitive with the state-of-the-art image captioning model across different evaluation metrics, while having a much simpler model and enabling parallel graphics processing unit (GPU) computation during training, resulting in a faster training time.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords