IEEE Access (Jan 2024)
Exploring Progress in Text-to-Image Synthesis: An In-Depth Survey on the Evolution of Generative Adversarial Networks
Abstract
The emergence of generative adversarial networks (GANs) has ignited substantial interest in the domain of synthesizing images from textual descriptions. This approach has demonstrated remarkable versatility and user-friendliness in producing conditioned images, showcasing notable progress in areas like diversity, visual realism, and semantic alignment in recent years. Notwithstanding these developments, the discipline still faces difficulties, such as producing high-resolution pictures with several objects and developing trustworthy evaluation standards that are in line with human vision. The goal of this study is to provide a comprehensive overview of the state of stochastic text-to-image creation models as of right now. It examines how they have changed over the previous five years and suggests a classification system depending on the degree of supervision required. The paper highlights shortcomings, provides a critical evaluation of current approaches for assessing text-to-image synthesizing models, and suggests further study areas. These goals include improving the training of models and designs for architecture, developing more reliable assessment criteria, and fine-tuning datasets. This review, which focuses on text-to-image synthesizing, is a useful addition to earlier surveys on adversarial networks that are generative and offers guidance for future studies on the subject.
Keywords