Image Captioning Based on Deep Neural Networks

Liu Shuang; Bai Liang; Hu Yanli; Wang Haoran

doi:10.1051/matecconf/201823201052

MATEC Web of Conferences (Jan 2018)

Image Captioning Based on Deep Neural Networks

Liu Shuang,
Bai Liang,
Hu Yanli,
Wang Haoran

Affiliations

Liu Shuang: College of Systems Engineering, National University of Defense Technology
Bai Liang: College of Systems Engineering, National University of Defense Technology
Hu Yanli: College of Systems Engineering, National University of Defense Technology
Wang Haoran: College of Systems Engineering, National University of Defense Technology

DOI: https://doi.org/10.1051/matecconf/201823201052
Journal volume & issue: Vol. 232
p. 01052

Abstract

Read online

With the development of deep learning, the combination of computer vision and natural language process has aroused great attention in the past few years. Image captioning is a representative of this filed, which makes the computer learn to use one or more sentences to understand the visual content of an image. The meaningful description generation process of high level image semantics requires not only the recognition of the object and the scene, but the ability of analyzing the state, the attributes and the relationship among these objects. Though image captioning is a complicated and difficult task, a lot of researchers have achieved significant improvements. In this paper, we mainly describe three image captioning methods using the deep neural networks: CNN-RNN based, CNN-CNN based and Reinforcement-based framework. Then we introduce the representative work of these three top methods respectively, describe the evaluation metrics and summarize the benefits and major challenges.

Published in MATEC Web of Conferences

ISSN: 2261-236X (Online)
Publisher: EDP Sciences
Country of publisher: France
LCC subjects: Technology: Engineering (General). Civil engineering (General)
Website: http://www.matec-conferences.org

About the journal