Middle-Level Attribute-Based Language Retouching for Image Caption Generation

Zhibin Guan; Kang Liu; Yan Ma; Xu Qian; Tongkai Ji

doi:10.3390/app8101850

Applied Sciences (Oct 2018)

Middle-Level Attribute-Based Language Retouching for Image Caption Generation

Zhibin Guan,
Kang Liu,
Yan Ma,
Xu Qian,
Tongkai Ji

Affiliations

Zhibin Guan: School of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, China
Kang Liu: School of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, China
Yan Ma: School of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, China
Xu Qian: School of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, China
Tongkai Ji: School of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, China

DOI: https://doi.org/10.3390/app8101850
Journal volume & issue: Vol. 8, no. 10
p. 1850

Abstract

Read online

Image caption generation is attractive research which focuses on generating natural language sentences to describe the visual content of a given image. It is an interdisciplinary subject combining computer vision (CV) and natural language processing (NLP). The existing image captioning methods are mainly focused on generating the final image caption directly, which may lose significant identification information of objects contained in the raw image. Therefore, we propose a new middle-level attribute-based language retouching (MLALR) method to solve this problem. Our proposed MLALR method uses the middle-level attributes predicted from the object regions to retouch the intermediate image description, which is generated by our language generation model. The advantage of our MLALR method is that it can correct descriptive errors in the intermediate image description and make the final image caption more accurate. Moreover, evaluation using benchmark datasets—MSCOCO, Flickr8K, and Flickr30K—validated the impressive performance of our MLALR method with evaluation metrics—BLEU, METEOR, ROUGE-L, CIDEr, and SPICE.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords