Applied Sciences (Oct 2018)

Middle-Level Attribute-Based Language Retouching for Image Caption Generation

  • Zhibin Guan,
  • Kang Liu,
  • Yan Ma,
  • Xu Qian,
  • Tongkai Ji

DOI
https://doi.org/10.3390/app8101850
Journal volume & issue
Vol. 8, no. 10
p. 1850

Abstract

Read online

Image caption generation is attractive research which focuses on generating natural language sentences to describe the visual content of a given image. It is an interdisciplinary subject combining computer vision (CV) and natural language processing (NLP). The existing image captioning methods are mainly focused on generating the final image caption directly, which may lose significant identification information of objects contained in the raw image. Therefore, we propose a new middle-level attribute-based language retouching (MLALR) method to solve this problem. Our proposed MLALR method uses the middle-level attributes predicted from the object regions to retouch the intermediate image description, which is generated by our language generation model. The advantage of our MLALR method is that it can correct descriptive errors in the intermediate image description and make the final image caption more accurate. Moreover, evaluation using benchmark datasets—MSCOCO, Flickr8K, and Flickr30K—validated the impressive performance of our MLALR method with evaluation metrics—BLEU, METEOR, ROUGE-L, CIDEr, and SPICE.

Keywords