Exploring Progress in Text-to-Image Synthesis: An In-Depth Survey on the Evolution of Generative Adversarial Networks

Md Ahsan Habib; Md Anwar Hussen Wadud; Md Fazlul Karim Patwary; Mohammad Motiur Rahman; M. F. Mridha; Yuichi Okuyama; Jungpil Shin

doi:10.1109/ACCESS.2024.3435541

IEEE Access (Jan 2024)

Exploring Progress in Text-to-Image Synthesis: An In-Depth Survey on the Evolution of Generative Adversarial Networks

Md Ahsan Habib,
Md Anwar Hussen Wadud,
Md Fazlul Karim Patwary,
Mohammad Motiur Rahman,
M. F. Mridha,
Yuichi Okuyama,
Jungpil Shin

Affiliations

Md Ahsan Habib: ORCiD; Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
Md Anwar Hussen Wadud: ORCiD; Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
Md Fazlul Karim Patwary: ORCiD; Institute of Information Technology, Jahangirnagar University, Savar, Dhaka, Bangladesh
Mohammad Motiur Rahman: ORCiD; Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
M. F. Mridha: ORCiD; Department of Computer Science, American International University-Bangladesh, Dhaka, Bangladesh
Yuichi Okuyama: ORCiD; School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu, Japan
Jungpil Shin: ORCiD; School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu, Japan

DOI: https://doi.org/10.1109/ACCESS.2024.3435541
Journal volume & issue: Vol. 12
pp. 178401 – 178440

Abstract

Read online

The emergence of generative adversarial networks (GANs) has ignited substantial interest in the domain of synthesizing images from textual descriptions. This approach has demonstrated remarkable versatility and user-friendliness in producing conditioned images, showcasing notable progress in areas like diversity, visual realism, and semantic alignment in recent years. Notwithstanding these developments, the discipline still faces difficulties, such as producing high-resolution pictures with several objects and developing trustworthy evaluation standards that are in line with human vision. The goal of this study is to provide a comprehensive overview of the state of stochastic text-to-image creation models as of right now. It examines how they have changed over the previous five years and suggests a classification system depending on the degree of supervision required. The paper highlights shortcomings, provides a critical evaluation of current approaches for assessing text-to-image synthesizing models, and suggests further study areas. These goals include improving the training of models and designs for architecture, developing more reliable assessment criteria, and fine-tuning datasets. This review, which focuses on text-to-image synthesizing, is a useful addition to earlier surveys on adversarial networks that are generative and offers guidance for future studies on the subject.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords