Single image super-resolution with denoising diffusion GANS

Heng Xiao; Xin Wang; Jun Wang; Jing-Ye Cai; Jian-Hua Deng; Jing-Ke Yan; Yi-Dong Tang

doi:10.1038/s41598-024-52370-3

Scientific Reports (Feb 2024)

Single image super-resolution with denoising diffusion GANS

Heng Xiao,
Xin Wang,
Jun Wang,
Jing-Ye Cai,
Jian-Hua Deng,
Jing-Ke Yan,
Yi-Dong Tang

Affiliations

Heng Xiao: School of Computer Science and Information Security, Guilin University of Electronic Technology
Xin Wang: School of Computer Science and Information Security, Guilin University of Electronic Technology
Jun Wang: School of Ocean Engineering, Guilin University of Electronic Technology
Jing-Ye Cai: School of Information and Software Engineering, University of Electronic Science and Technology of China
Jian-Hua Deng: School of Information and Software Engineering, University of Electronic Science and Technology of China
Jing-Ke Yan: School of Ocean Engineering, Guilin University of Electronic Technology
Yi-Dong Tang: School of Computer Science and Information Security, Guilin University of Electronic Technology

DOI: https://doi.org/10.1038/s41598-024-52370-3
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 18

Abstract

Read online

Abstract Single image super-resolution (SISR) refers to the reconstruction from the corresponding low-resolution (LR) image input to a high-resolution (HR) image. However, since a single low-resolution image corresponds to multiple high-resolution images, this is an ill-posed problem. In recent years, generative model-based SISR methods have outperformed conventional SISR methods in performance. However, the SISR methods based on GAN, VAE, and Flow have the problems of unstable training, low sampling quality, and expensive computational cost. These models also struggle to achieve the trifecta of diverse, high-quality, and fast sampling. In particular, denoising diffusion probabilistic models have shown impressive variety and high quality of samples, but their expensive sampling cost prevents them from being well applied in the real world. In this paper, we investigate the fundamental reason for the slow sampling speed of the SISR method based on the diffusion model lies in the Gaussian assumption used in the previous diffusion model, which is only applicable for small step sizes. We propose a new Single Image Super-Resolution with Denoising Diffusion GANS (SRDDGAN) to achieve large-step denoising, sample diversity, and training stability. Our approach combines denoising diffusion models with GANs to generate images conditionally, using a multimodal conditional GAN to model each denoising step. SRDDGAN outperforms existing diffusion model-based methods regarding PSNR and perceptual quality metrics, while the added latent variable Z solution explores the diversity of likely HR spatial domain. Notably, the SRDDGAN model infers nearly 11 times faster than diffusion-based SR3, making it a more practical solution for real-world applications.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal