Algorithms (Dec 2024)

Data Augmentation for Voiceprint Recognition Using Generative Adversarial Networks

  • Yao-San Lin,
  • Hung-Yu Chen,
  • Mei-Ling Huang,
  • Tsung-Yu Hsieh

DOI
https://doi.org/10.3390/a17120583
Journal volume & issue
Vol. 17, no. 12
p. 583

Abstract

Read online

Voiceprint recognition systems often face challenges related to limited and diverse datasets, which hinder their performance and generalization capabilities. This study proposes a novel approach that integrates generative adversarial networks (GANs) for data augmentation and convolutional neural networks (CNNs) with mel-frequency cepstral coefficients (MFCCs) for voiceprint classification. Experimental results demonstrate that the proposed methodology improves recognition accuracy by up to 15% in low-resource scenarios. The optimal ratio of real-to-GAN-generated samples was determined to be 3:2, which balanced dataset diversity and model performance. In specific cases, the model achieved an accuracy of 96.6%, showcasing its effectiveness in capturing unique voice characteristics while mitigating overfitting. These results highlight the potential of combining GAN-augmented data and CNN-based classification to enhance voiceprint recognition in diverse and resource-constrained environments.

Keywords