IEEE Access (Jan 2024)

Anomaly Detection of Deepfake Audio Based on Real Audio Using Generative Adversarial Network Model

  • Daeun Song,
  • Nayoung Lee,
  • Jiwon Kim,
  • Eunjung Choi

DOI
https://doi.org/10.1109/ACCESS.2024.3506973
Journal volume & issue
Vol. 12
pp. 184311 – 184326

Abstract

Read online

Deepfake audio causes damage not only to individuals and companies, but also to nations; therefore, research on deepfake audio detection technology is crucial. Most existing deepfake audio detection research has been conducted using supervised learning; however, when a new synthetic deepfake audio emerges, real-time detection becomes difficult because of the limitations of supervised learning. Therefore, this paper proposes a new anomaly detection technique for identifying deep-fake audio using unsupervised learning. This method involves learning the feature distribution of numerous real human voices and then calculating an anomaly score for each voice to determine whether it is deepfake. In this study, we imaged speech using mel-spectrogram and mel-frequency cepstral coefficient (MFCC), which are speech preprocessing methods. Subsequently, the parameters of the GANomaly and f-AnoGAN models, which are effective in detecting abnormalities in speech, were tuned and subjected to unsupervised training. The most effective result had an F1-score of 0.93 in and was obtained by combining imaging speech with Mel-Spectrogram with training using the GANomaly model.

Keywords