IEEE Access (Jan 2022)

Text-Guided Sketch-to-Photo Image Synthesis

  • Uche Osahor,
  • Nasser M. Nasrabadi

DOI
https://doi.org/10.1109/ACCESS.2022.3206771
Journal volume & issue
Vol. 10
pp. 98278 – 98289

Abstract

Read online

We propose a text-guided sketch-to-image synthesis model that semantically mixes style and content features from the latent space of an inverted Generative Adversarial Network (GAN). Our goal is to synthesize plausible images from human facial sketches and their respective text descriptions. In our approach, we adapted a generative model termed Contextual GAN (CT-GAN) that efficiently encodes visual-linguistic semantic features pre-trained on over 400 million text-image pairs at different resolutions along the model. Also, we introduced an intermediate mapping network called c-Map that combines textual and visual-based features to a disentangled latent space $\mathcal {W{+}}$ for better feature matching. Furthermore to maximise the computational performance of our model, we implemented a linear-based attention scheme along the pipeline of our model to eliminate the drawbacks of inefficient attention modules that are quadratic in complexity. Finally, the hierarchical setting of our model ensures that textual, style and content features are synthesised based on their unique fine grained details, which result in visually appealing images.

Keywords