Enhanced Text-to-Image Synthesis With Self-Supervision

Yong Xuan Tan; Chin Poo Lee; Mai Neo; Kian Ming Lim; Jit Yan Lim

doi:10.1109/ACCESS.2023.3268869

IEEE Access (Jan 2023)

Enhanced Text-to-Image Synthesis With Self-Supervision

Yong Xuan Tan,
Chin Poo Lee,
Mai Neo,
Kian Ming Lim,
Jit Yan Lim

Affiliations

Yong Xuan Tan: ORCiD; Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, Melaka, Malaysia
Chin Poo Lee: ORCiD; Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, Melaka, Malaysia
Mai Neo: Faculty of Creative Multimedia, Multimedia University, Persiaran Multimedia, Cyberjaya, Selangor, Malaysia
Kian Ming Lim: ORCiD; Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, Melaka, Malaysia
Jit Yan Lim: ORCiD; Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, Melaka, Malaysia

DOI: https://doi.org/10.1109/ACCESS.2023.3268869
Journal volume & issue: Vol. 11
pp. 39508 – 39519

Abstract

Read online

The task of Text-to-Image synthesis is a difficult challenge, especially when dealing with low-data regimes, where the number of training samples is limited. In order to address this challenge, the Self-Supervision Text-to-Image Generative Adversarial Networks (SS-TiGAN) has been proposed. The method employs a bi-level architecture, which allows for the use of self-supervision to increase the number of training samples by generating rotation variants. This, in turn, maximizes the diversity of the model representation and enables the exploration of high-level object information for more detailed image construction. In addition to the use of self-supervision, SS-TiGAN also investigates various techniques to address the stability issues that arise in Generative Adversarial Networks. By implementing these techniques, the proposed SS-TiGAN has achieved a new state-of-the-art performance on two benchmark datasets, Oxford-102 and CUB. These results demonstrate the effectiveness of the SS-TiGAN method in synthesizing high-quality, realistic images from text descriptions under low-data regimes.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords