IEEE Open Journal of Signal Processing (Jan 2024)

TolerantGAN: Text-Guided Image Manipulation Tolerant to Real-World Image

  • Yuto Watanabe,
  • Ren Togo,
  • Keisuke Maeda,
  • Takahiro Ogawa,
  • Miki Haseyama

DOI
https://doi.org/10.1109/OJSP.2023.3343335
Journal volume & issue
Vol. 5
pp. 150 – 159

Abstract

Read online

Although text-guided image manipulation approaches have demonstrated highly accurate performance for editing the appearance of images in a virtual or simple scenario, their real-world applications face significant challenges. The primary cause of these challenges is the misalignment in the distribution of training and real-world data, which leads to unstable text-guided image manipulation. In this work, we propose a novel framework called TolerantGAN and tackle the new task of real-world text-guided image manipulation independent of the training data. To achieve this, we introduce two key concepts of a border smoothly connection module (BSCM) and a manipulation direction-based attention module (MDAM). BSCM smoothens the misalignment in the distribution of training and real-world data. MDAM extracts only regions highly relevant for image manipulation and assists in reconstructing unobserved objects in the training data. For in-the-wild input images of various classes, TolerantGAN robustly outperforms the state-of-the-art methods.

Keywords