IEEE Access (Jan 2024)
Colorize at Will: Harnessing Diffusion Prior for Image Colorization
Abstract
Image colorization, a pivotal aspect of computer vision, employs advanced algorithms to transform grayscale images into realistic colors. This task is inherently challenging due to the need to balance colorfulness and fidelity while preserving local spatial structures and eliminating ghosting effect. To address these issues, Our research introduces a novel pipeline leveraging Stable Diffusion for image colorization, guided by color hint points or textual descriptions. Compared to current text-to-image model, Our key contributions include multi-modal input flexibility, a trainable pixel-level encoder and a controllable feature modulation block. The multi-modal input flexibility allows for the simultaneous use of grayscale images with color hint points and textual descriptions, facilitating the generation of colorized outputs with greater precision and alignment with user instructions. The trainable pixel-level encoder extracts multi-scale features from input images, guiding the diffusion process to capture generative diffusion prior for image colorization, thereby achieving better consistency between the input and output images. Additionally, the controllable feature modulation block is introduced to strike a balance between colorfulness and precision through an adjustable coefficient $\alpha $ . By integrating Stable Diffusion with these innovative guidance advancements, our model overcomes previous limitations and showcases the potential of advanced generative models to produce highly realistic and contextually appropriate colorized images, significantly impacting applications such as historical restoration and contemporary creative processes.
Keywords