Review of Image Captioning Methods Based on Encoding-Decoding Technology

GENG Yaogang, MEI Hongyan, ZHANG Xing, LI Xiaohui

doi:10.3778/j.issn.1673-9418.2112080

Jisuanji kexue yu tansuo (Oct 2022)

Review of Image Captioning Methods Based on Encoding-Decoding Technology

GENG Yaogang, MEI Hongyan, ZHANG Xing, LI Xiaohui

Affiliations

GENG Yaogang, MEI Hongyan, ZHANG Xing, LI Xiaohui: School of Electronic and Information Engineering, Liaoning University of Technology, Jinzhou, Liaoning 121000, China

DOI: https://doi.org/10.3778/j.issn.1673-9418.2112080
Journal volume & issue: Vol. 16, no. 10
pp. 2234 – 2248

Abstract

Read online

In recent years, image caption generation, as a multimodal task in the field of artificial intelligence, integrates the related research of computer vision and natural language processing, and can realize the modal conversion from image to text. It plays an important role in visual assistance and image understanding, and has attracted extensive attention from researchers. Firstly, this paper describes the task of image caption generation, and introduces three image caption generation methods: template-based method, retrieval-based method and encode-decode method. Their respective method ideas, representative research and advantages and disadvantages are also introduced. Secondly, from the model structure, the research progress of image understanding phase and caption generation phase, this paper expounds in detail the method based on encoding-decoding, and summarizes the research over years into the research of image understanding and caption generation. Image understanding research includes attention mechanism and semantic aspects. The research of caption generation is divided into traditional caption generation, dense caption generation and stylish caption generation. The performance, advantages and disadvantages of the model are summarized, and the datasets and evaluation index of the performance evaluation of the image captioning model are introduced. Finally, the challenges and difficulties in the field of image captioning are pointed out.

|image caption generation|encode|decode|multimodal|attention mechanism

Published in Jisuanji kexue yu tansuo

ISSN: 1673-9418 (Print)
Publisher: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://fcst.ceaj.org

About the journal

Abstract

Keywords