Universal Image Restoration with Text Prompt Diffusion

Bing Yu; Zhenghui Fan; Xue Xiang; Jiahui Chen; Dongjin Huang

doi:10.3390/s24123917

Sensors (Jun 2024)

Universal Image Restoration with Text Prompt Diffusion

Bing Yu,
Zhenghui Fan,
Xue Xiang,
Jiahui Chen,
Dongjin Huang

Affiliations

Bing Yu: Shanghai Film Academy, Shanghai University, Shanghai 200072, China
Zhenghui Fan: Shanghai Film Academy, Shanghai University, Shanghai 200072, China
Xue Xiang: Shanghai Film Academy, Shanghai University, Shanghai 200072, China
Jiahui Chen: Shanghai Film Academy, Shanghai University, Shanghai 200072, China
Dongjin Huang: Shanghai Film Academy, Shanghai University, Shanghai 200072, China

DOI: https://doi.org/10.3390/s24123917
Journal volume & issue: Vol. 24, no. 12
p. 3917

Abstract

Read online

Universal image restoration (UIR) aims to accurately restore images with a variety of unknown degradation types and levels. Existing methods, including both learning-based and prior-based approaches, heavily rely on low-quality image features. However, it is challenging to extract degradation information from diverse low-quality images, which limits model performance. Furthermore, UIR necessitates the recovery of images with diverse and complex types of degradation. Inaccurate estimations further decrease restoration performance, resulting in suboptimal recovery outcomes. To enhance UIR performance, a viable approach is to introduce additional priors. The current UIR methods have problems such as poor enhancement effect and low universality. To address this issue, we propose an effective framework based on a diffusion model (DM) for universal image restoration, dubbed ETDiffIR. Inspired by the remarkable performance of text prompts in the field of image generation, we employ text prompts to improve the restoration of degraded images. This framework utilizes a text prompt corresponding to the low-quality image to assist the diffusion model in restoring the image. Specifically, a novel text–image fusion block is proposed by combining the CLIP text encoder and the DA-CLIP image controller, which integrates text prompt encoding and degradation type encoding into time step encoding. Moreover, to reduce the computational cost of the denoising UNet in the diffusion model, we develop an efficient restoration U-shaped network (ERUNet) to achieve favorable noise prediction performance via depthwise convolution and pointwise convolution. We evaluate the proposed method on image dehazing, deraining, and denoising tasks. The experimental results indicate the superiority of our proposed algorithm.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords