A Study on Generating Webtoons Using Multilingual Text-to-Image Models

Kyungho Yu; Hyoungju Kim; Jeongin Kim; Chanjun Chun; Pankoo Kim

doi:10.3390/app13127278

Applied Sciences (Jun 2023)

A Study on Generating Webtoons Using Multilingual Text-to-Image Models

Kyungho Yu,
Hyoungju Kim,
Jeongin Kim,
Chanjun Chun,
Pankoo Kim

Affiliations

Kyungho Yu: Department of Computer Engineering, Chosun University, 309 Pilmun-Daero, Dong-Gu, Gwangju 61452, Republic of Korea
Hyoungju Kim: Department of Computer Engineering, Chosun University, 309 Pilmun-Daero, Dong-Gu, Gwangju 61452, Republic of Korea
Jeongin Kim: Department of Computer Engineering, Chosun University, 309 Pilmun-Daero, Dong-Gu, Gwangju 61452, Republic of Korea
Chanjun Chun: Department of Computer Engineering, Chosun University, 309 Pilmun-Daero, Dong-Gu, Gwangju 61452, Republic of Korea
Pankoo Kim: Department of Computer Engineering, Chosun University, 309 Pilmun-Daero, Dong-Gu, Gwangju 61452, Republic of Korea

DOI: https://doi.org/10.3390/app13127278
Journal volume & issue: Vol. 13, no. 12
p. 7278

Abstract

Read online

Text-to-image technology enables computers to create images from text by simulating the human process of forming mental images. GAN-based text-to-image technology involves extracting features from input text; subsequently, they are combined with noise and used as input to a GAN, which generates images similar to the original images via competition between the generator and discriminator. Although images have been extensively generated from English text, text-to-image technology based on multilingualism, such as Korean, is in its developmental stage. Webtoons are digital comic formats for viewing comics online. The webtoon creation process involves story planning, content/sketching, coloring, and background drawing, all of which require human intervention, thus being time-consuming and expensive. Therefore, this study proposes a multilingual text-to-image model capable of generating webtoon images when presented with multilingual input text. The proposed model employs multilingual BERT to extract feature vectors for multiple languages and trains a DCGAN in conjunction with the images. The experimental results demonstrate that the model can generate images similar to the original images when presented with multilingual input text after training. The evaluation metrics further support these findings, as the generated images achieved an Inception score of 4.99 and an FID score of 22.21.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords