IEEE Access (Jan 2024)
Sketch-Guided Latent Diffusion Model for High-Fidelity Face Image Synthesis
Abstract
Synthesizing facial images from monochromatic sketches is one of the most fundamental tasks in the field of image-to-image translation. However, it is still challenging to teach model high-dimensional face features, such as geometry and color, and to the characteristics of input sketches, which should be considered simultaneously. Existing methods often use sketches as indirect inputs (or as auxiliary inputs) to guide models, resulting in the loss of sketch features or in alterations to geometry information. In this paper, we introduce a Sketch-Guided Latent Diffusion Model (SGLDM), an LDM-based network architecture trained on the paired sketch-face dataset. We apply a Multi-Auto-Encoder (AE) to encode the different input sketches from the various regions of a face from the pixel space into a feature map in the latent space, enabling us to reduce the dimensions of the sketch input while preserving the geometry-related information of the local face details. We build a sketch-face paired dataset based on an existing method XDoG and Sketch Simplification that extracts the edge map from an image. We then introduce a Stochastic Region Abstraction (SRA), an approach to augmenting our dataset to improve the robustness of the SGLDM to handle arbitrarily abstract sketch inputs. The evaluation study shows that the SGLDM can synthesize high-quality face images with different expressions, facial accessories, and hairstyles from various sketches having different abstraction levels, and the code and model have been released on the project page. https://puckikk1202.github.io/difffacesketch2023/
Keywords