Sketch-Guided Latent Diffusion Model for High-Fidelity Face Image Synthesis

Yichen Peng; Chunqi Zhao; Haoran Xie; Tsukasa Fukusato; Kazunori Miyata

doi:10.1109/ACCESS.2023.3346408

IEEE Access (Jan 2024)

Sketch-Guided Latent Diffusion Model for High-Fidelity Face Image Synthesis

Yichen Peng,
Chunqi Zhao,
Haoran Xie,
Tsukasa Fukusato,
Kazunori Miyata

Affiliations

Yichen Peng: ORCiD; Japan Advanced Institute of Science and Technology, Nomi-shi, Ishikawa, Japan
Chunqi Zhao: School of Creative Informatics, The University of Tokyo, Tokyo, Bunkyo, Japan
Haoran Xie: ORCiD; Japan Advanced Institute of Science and Technology, Nomi-shi, Ishikawa, Japan
Tsukasa Fukusato: ORCiD; School of Fundamental Science and Engineering, Waseda University, Tokyo, Shinjuku, Japan
Kazunori Miyata: ORCiD; Japan Advanced Institute of Science and Technology, Nomi-shi, Ishikawa, Japan

DOI: https://doi.org/10.1109/ACCESS.2023.3346408
Journal volume & issue: Vol. 12
pp. 5770 – 5780

Abstract

Read online

Synthesizing facial images from monochromatic sketches is one of the most fundamental tasks in the field of image-to-image translation. However, it is still challenging to teach model high-dimensional face features, such as geometry and color, and to the characteristics of input sketches, which should be considered simultaneously. Existing methods often use sketches as indirect inputs (or as auxiliary inputs) to guide models, resulting in the loss of sketch features or in alterations to geometry information. In this paper, we introduce a Sketch-Guided Latent Diffusion Model (SGLDM), an LDM-based network architecture trained on the paired sketch-face dataset. We apply a Multi-Auto-Encoder (AE) to encode the different input sketches from the various regions of a face from the pixel space into a feature map in the latent space, enabling us to reduce the dimensions of the sketch input while preserving the geometry-related information of the local face details. We build a sketch-face paired dataset based on an existing method XDoG and Sketch Simplification that extracts the edge map from an image. We then introduce a Stochastic Region Abstraction (SRA), an approach to augmenting our dataset to improve the robustness of the SGLDM to handle arbitrarily abstract sketch inputs. The evaluation study shows that the SGLDM can synthesize high-quality face images with different expressions, facial accessories, and hairstyles from various sketches having different abstraction levels, and the code and model have been released on the project page. https://puckikk1202.github.io/difffacesketch2023/

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords