IEEE Access (Jan 2020)

Instance Mask Embedding and Attribute-Adaptive Generative Adversarial Network for Text-to-Image Synthesis

  • Jiancheng Ni,
  • Susu Zhang,
  • Zili Zhou,
  • Jie Hou,
  • Feng Gao

DOI
https://doi.org/10.1109/ACCESS.2020.2975841
Journal volume & issue
Vol. 8
pp. 37697 – 37711

Abstract

Read online

Existing image generation models have achieved the synthesis of reasonable individuals and complex but low-resolution images. Directly from complicated text to high-resolution image generation still remains a challenge. To this end, we propose the instance mask embedding and attribute-adaptive generative adversarial network (IMEAA-GAN). Firstly, we use the box regression network to compute a global layout containing the class labels and locations for each instance. Then the global generator encodes the layout, combines the whole text embedding and noise to preliminarily generate a low-resolution image; the instance embedding mechanism is used firstly to guide local refinement generators obtain fine-grained local features and generate a more realistic image. Finally, in order to synthesize the exact visual attributes, we introduce the multi-scale attribute-adaptive discriminator, which provides local refinement generators with the specific training signals to explicitly generate instance-level features. Extensive experiments based on the MS-COCO dataset and the Caltech-UCSD Birds-200-2011 dataset show that our model can obtain globally consistent attributes and generate complex images with local texture details.

Keywords