IEEE Access (Jan 2024)
Animated Avatar Generation Technology Research Based on Deep Convolutional Generative Adversarial Network Integrated With Self-Attention and Spectral Normalization
Abstract
The burgeoning field of large language models (LLMs), exemplified by DALL-E and Stable Diffusion, has made image generation a reality. However, the computationally intensive GPU training these models necessitate incurs substantial financial burdens. Moreover, while a plethora of image datasets are accessible, specialized anime avatar datasets remain elusive and are often entangled in copyright disputes. This scarcity presents a significant research opportunity: developing a cost-effective, user-friendly anime avatar generation technique that circumvents these challenges. This paper introduces a novel method for creating animated avatars, leveraging the deep convolutional generative adversarial network(DCGAN) architecture and enhanced with Self-Attention (SA) and Spectral Normalization (SN), termed the SA+SN-DCGAN. The integration of the SA mechanism into the generator significantly elevates the quality of the output. Meanwhile, the application of SN to the discriminator effectively combats the notorious vanishing or exploding gradients, and thereby diminishing the likelihood of over-fitting. Our methodology involved sourcing anime avatars from reputable public domains and standardizing them using OpenCV. A meticulous grid search was employed to fine-tune model hyper-parameters. After 300 epochs of rigorous training, the generator and discriminator achieved stable error rates, with the synthesized images closely mirroring the fidelity of authentic avatars. Comparative evaluations against prevailing models underscore the SA+SN_DCGAN method’s superiority in producing highly realistic anime avatars, affirming its exceptional overall performance. This study not only contributes a novel solution to the domain of anime avatar generation but also paves the way for future research in the field.
Keywords