Prompt-Based Learning for Image Variation Using Single Image Multi-Scale Diffusion Models

Jiwon Park; Dasol Jeong; Hyebean Lee; Seunghee Han; Joonki Paik

doi:10.1109/ACCESS.2024.3487215

IEEE Access (Jan 2024)

Prompt-Based Learning for Image Variation Using Single Image Multi-Scale Diffusion Models

Jiwon Park,
Dasol Jeong,
Hyebean Lee,
Seunghee Han,
Joonki Paik

Affiliations

Jiwon Park: Department of Artificial Intelligence, Chung-Ang University, Seoul, South Korea
Dasol Jeong: ORCiD; Department of Image, Chung-Ang University, Seoul, South Korea
Hyebean Lee: Department of Image, Chung-Ang University, Seoul, South Korea
Seunghee Han: Department of Image, Chung-Ang University, Seoul, South Korea
Joonki Paik: ORCiD; Department of Artificial Intelligence, Chung-Ang University, Seoul, South Korea

DOI: https://doi.org/10.1109/ACCESS.2024.3487215
Journal volume & issue: Vol. 12
pp. 158810 – 158823

Abstract

Read online

In this paper, we propose a novel technique for a multi-scale framework with text-based learning using a single image to perform variations and text-based editing of the input image. Our approach captures the detailed internal information of a single image, enabling numerous variations while preserving the original features. In addition, text-conditioned learning provides a method to combine text and images to effectively perform text-based editing based on a single image. We propose a technique that integrates the diffusion U-Net structure within a multi-scale framework to accurately capture the quality and internal structure of an image from a single image and perform diverse variations while maintaining the features of the original image. Additionally, we utilized a pre-trained Bootstrapped Language-Image Pretraining (BLIP) model to generate various prompts for effective text-based editing, and we fed the prompts that most closely resembled the input image into the training process using Contrastive Language-Image Pretraining (CLIP)’s prior knowledge. To improve accuracy during the image editing stage, we designed a contrastive loss function to enhance the relevance between the prompt and the image. As a result, we improved the performance of learning between text and images, and through various experiments, we demonstrated its effectiveness on text-based image editing tasks. Our experiments show that the proposed method significantly improves the performance of single-image-based generative models and presents new possibilities in the field of text-based image editing.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords