GACnet-Text-to-Image Synthesis With Generative Models Using Attention Mechanisms With Contrastive Learning

Md. Ahsan Habib; Md. Anwar Hussen Wadud; Lubna Yeasmin Pinky; Mehedi Hasan Talukder; Mohammad Motiur Rahman; M. F. Mridha; Yuichi Okuyama; Jungpil Shin

doi:10.1109/ACCESS.2023.3342866

IEEE Access (Jan 2024)

GACnet-Text-to-Image Synthesis With Generative Models Using Attention Mechanisms With Contrastive Learning

Md. Ahsan Habib,
Md. Anwar Hussen Wadud,
Lubna Yeasmin Pinky,
Mehedi Hasan Talukder,
Mohammad Motiur Rahman,
M. F. Mridha,
Yuichi Okuyama,
Jungpil Shin

Affiliations

Md. Ahsan Habib: ORCiD; Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
Md. Anwar Hussen Wadud: ORCiD; Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
Lubna Yeasmin Pinky: Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
Mehedi Hasan Talukder: Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
Mohammad Motiur Rahman: ORCiD; Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
M. F. Mridha: ORCiD; Department of Computer Science, American International University-Bangladesh, Dhaka, Bangladesh
Yuichi Okuyama: ORCiD; School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu, Japan
Jungpil Shin: ORCiD; School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu, Japan

DOI: https://doi.org/10.1109/ACCESS.2023.3342866
Journal volume & issue: Vol. 12
pp. 9572 – 9585

Abstract

Read online

The generation of high-quality images from textual descriptions is a challenging task in computer vision and natural language processing. The goal of text-to-image synthesis, a current topic of research, is to produce excellent images from written descriptions. This study proposes a hybrid approach to evaluating a dataset consisting of various text-image pairs by efficiently combining conditional generative adversarial networks (C-GAN), attention mechanisms, and contrastive learning (C-GAN+ATT+CL). We suggest a two-step method to improve image quality that starts by utilizing generative adversarial networks (GANs) with attention mechanisms to create low-resolution images and then contrastive learning to improve. Contrastive learning modules train on a separate dataset of high-resolution pictures; GANs learn on datasets of low-resolution text and image pairs. The Conditional GAN with Attention Mechanism and Contrastive Learning Method provides state-of-the-art performance in terms of image quality, diversity, and visual realism, among the several methods. The results of this study demonstrate that the proposed approach works better than all other methods, achieving an Inception Score (IS) of 35.23, a Fréchet Inception Distance (FID) of 18.2, and an R-Precision of 89.14. Our findings demonstrate that our “C-GAN+ATT+CL” approach significantly improves image quality and diversity and offers exciting paths for further study.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords