Image Caption Generation Using Multi-Level Semantic Context Information

Peng Tian; Hongwei Mo; Laihao Jiang

doi:10.3390/sym13071184

Symmetry (Jun 2021)

Image Caption Generation Using Multi-Level Semantic Context Information

Peng Tian,
Hongwei Mo,
Laihao Jiang

Affiliations

Peng Tian: College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China
Hongwei Mo: College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China
Laihao Jiang: College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China

DOI: https://doi.org/10.3390/sym13071184
Journal volume & issue: Vol. 13, no. 7
p. 1184

Abstract

Read online

Object detection, visual relationship detection, and image captioning, which are the three main visual tasks in scene understanding, are highly correlated and correspond to different semantic levels of scene image. However, the existing captioning methods convert the extracted image features into description text, and the obtained results are not satisfactory. In this work, we propose a Multi-level Semantic Context Information (MSCI) network with an overall symmetrical structure to leverage the mutual connections across the three different semantic layers and extract the context information between them, to solve jointly the three vision tasks for achieving the accurate and comprehensive description of the scene image. The model uses a feature refining structure to mutual connections and iteratively updates the different semantic features of the image. Then a context information extraction network is used to extract the context information between the three different semantic layers, and an attention mechanism is introduced to improve the accuracy of image captioning while using the context information between the different semantic layers to improve the accuracy of object detection and relationship detection. Experiments on the VRD and COCO datasets demonstrate that our proposed model can leverage the context information between semantic layers to improve the accuracy of those visual tasks generation.

Published in Symmetry

ISSN: 2073-8994 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/symmetry/

About the journal

Abstract

Keywords