A survey of generative adversarial networks and their application in text-to-image synthesis

Wu Zeng; Heng-liang Zhu; Chuan Lin; Zheng-ying Xiao

doi:10.3934/era.2023362

Electronic Research Archive (Nov 2023)

A survey of generative adversarial networks and their application in text-to-image synthesis

Wu Zeng,
Heng-liang Zhu,
Chuan Lin ,
Zheng-ying Xiao

Affiliations

Wu Zeng: 1. Engineering Training Center, Putian University, Putian 351100, China
Heng-liang Zhu: 2. College of Computer Science and Mathematics, Fujian University of Technology, Fuzhou 350118, China 3. Fujian Provincial Universities Key Laboratory of Industrial Control and Data Analysis, Fuzhou 350118, China
Chuan Lin: 4. School of Mechanical, Electrical & Information Engineering, Putian University, Putian 351100, China
Zheng-ying Xiao: 1. Engineering Training Center, Putian University, Putian 351100, China

DOI: https://doi.org/10.3934/era.2023362
Journal volume & issue: Vol. 31, no. 12
pp. 7142 – 7181

Abstract

Read online

With the continuous development of science and technology (especially computational devices with powerful computing capabilities), the image generation technology based on deep learning has also made significant achievements. Most cross-modal technologies based on deep learning can generate information from text into images, which has become a hot topic of current research. Text-to-image (T2I) synthesis technology has applications in multiple fields of computer vision, such as image enhancement, artificial intelligence painting, games and virtual reality. The T2I generation technology using generative adversarial networks can generate more realistic and diverse images, but there are also some shortcomings and challenges, such as difficulty in generating complex backgrounds. This review will be introduced in the following order. First, we introduce the basic principles and architecture of basic and classic generative adversarial networks (GANs). Second, this review categorizes T2I synthesis methods into four main categories. There are methods based on semantic enhancement, methods based on progressive structure, methods based on attention and methods based on introducing additional signals. We have chosen some of the classic and latest T2I methods for introduction and explain their main advantages and shortcomings. Third, we explain the basic dataset and evaluation indicators in the T2I field. Finally, prospects for future research directions are discussed. This review provides a systematic introduction to the basic GAN method and the T2I method based on it, which can serve as a reference for researchers.

Published in Electronic Research Archive

ISSN: 2688-1594 (Online)
Publisher: AIMS Press
Country of publisher: United States
LCC subjects: Science: Mathematics; Technology: Technology (General): Industrial engineering. Management engineering: Applied mathematics. Quantitative methods
Website: https://www.aimspress.com/journal/era

About the journal

Abstract

Keywords