Intelligent Computing (Jan 2023)

CD-GAN: Commonsense-Driven Generative Adversarial Network with Hierarchical Refinement for Text-to-Image Synthesis

  • Guokai Zhang,
  • Ning Xu,
  • Chenggang Yan,
  • Bolun Zheng,
  • Yulong Duan,
  • Bo Lv,
  • An-An Liu

DOI
https://doi.org/10.34133/icomputing.0017
Journal volume & issue
Vol. 2

Abstract

Read online

Synthesizing vivid images with descriptive texts is gradually emerging as a frontier cross-domain generation task. However, it is obviously inadequate to generate the high-quality image with one single sentence accurately due to the information asymmetry between modalities, which needs external knowledge to balance the process. Moreover, the limited description of the entities in the sentence cannot guarantee the semantic consistency between text and generated image, causing the deficiency of details in foreground and background. Here, we propose a commonsense-driven generative adversarial network to generate photo-realistic images depending on entity-related commonsense knowledge. Commonsense-driven generative adversarial network contains 2 key commonsense-based modules: (a) Entity semantic augment is designed to enhance entity semantics with common sense for abating the information asymmetry, and (b) adaptive entity refinement is used to generate the high-resolution image guided by various commonsense knowledges in multistage for keeping text-image consistency. We demonstrated extensive synthetic cases on the widely used CUB-birds (Caltech-UCSD Birds-200-2011) dataset, where our model achieves competitive results compared to the other state-of-the-art models.