IEEE Access (Jan 2023)

GrainedCLIP and DiffusionGrainedCLIP: Text-Guided Advanced Models for Fine-Grained Attribute Face Image Processing

  • Jincheng Zhu,
  • Liwei Mu

DOI
https://doi.org/10.1109/ACCESS.2023.3313248
Journal volume & issue
Vol. 11
pp. 99030 – 99045

Abstract

Read online

Text-guided image processing has made tremendous progress in recent years. Most existing methods generally focus on using visual-language pre-training models for text-guided image processing. However, their applications to achieve text-guided fine-grained attribute face image processing (e.g., editing a smiling face to change from showing teeth to a closed-mouth smile) lead to poor performance due to the limited fine-grained semantic knowledge learned by existing visual-language pre-training models. To alleviate this problem, we propose a novel visual-language pre-training model based on fine-grained facial attribute features, which we call GrainedCLIP. Based on GrainedCLIP, we further propose a new text-guided fine-grained attribute face image processing model, which we call DiffusionGrainedCLIP. Our experimental results showed that GrainedCLIP outperformed existing methods, achieving $12.61 R$ @1 and $12.17 R$ @1 in text-to-image and image-to-text retrieval evaluation metrics, respectively, on the FFHQ dataset. Furthermore, compared to state-of-the-art text-guided face image processing methods, DiffusionGrainedCLIP significantly improved 55.37% in semantic consistency and 49.38% in face identity preservation on the FFHQ dataset.

Keywords