Animated Avatar Generation Technology Research Based on Deep Convolutional Generative Adversarial Network Integrated With Self-Attention and Spectral Normalization

Houmin Wu; Sangguk Lim; Bin Xiao

doi:10.1109/access.2024.3482989

IEEE Access (Jan 2024)

Animated Avatar Generation Technology Research Based on Deep Convolutional Generative Adversarial Network Integrated With Self-Attention and Spectral Normalization

Houmin Wu,
Sangguk Lim,
Bin Xiao

Affiliations

Houmin Wu: School of Information Engineering, Guangzhou Vocational College of Technology and Business, Guangzhou, China
Sangguk Lim: Department of Computer and Information Engineering, Youngsan University, Yangsan-si, Republic of Korea
Bin Xiao: ORCiD; School of Information Engineering, Guangzhou Vocational College of Technology and Business, Guangzhou, China

DOI: https://doi.org/10.1109/access.2024.3482989
Journal volume & issue: Vol. 12
pp. 154614 – 154630

Abstract

Read online

The burgeoning field of large language models (LLMs), exemplified by DALL-E and Stable Diffusion, has made image generation a reality. However, the computationally intensive GPU training these models necessitate incurs substantial financial burdens. Moreover, while a plethora of image datasets are accessible, specialized anime avatar datasets remain elusive and are often entangled in copyright disputes. This scarcity presents a significant research opportunity: developing a cost-effective, user-friendly anime avatar generation technique that circumvents these challenges. This paper introduces a novel method for creating animated avatars, leveraging the deep convolutional generative adversarial network(DCGAN) architecture and enhanced with Self-Attention (SA) and Spectral Normalization (SN), termed the SA+SN-DCGAN. The integration of the SA mechanism into the generator significantly elevates the quality of the output. Meanwhile, the application of SN to the discriminator effectively combats the notorious vanishing or exploding gradients, and thereby diminishing the likelihood of over-fitting. Our methodology involved sourcing anime avatars from reputable public domains and standardizing them using OpenCV. A meticulous grid search was employed to fine-tune model hyper-parameters. After 300 epochs of rigorous training, the generator and discriminator achieved stable error rates, with the synthesized images closely mirroring the fidelity of authentic avatars. Comparative evaluations against prevailing models underscore the SA+SN_DCGAN method’s superiority in producing highly realistic anime avatars, affirming its exceptional overall performance. This study not only contributes a novel solution to the domain of anime avatar generation but also paves the way for future research in the field.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords