Voice spoofing detection for multiclass attack classification using deep learning

Jason Boyd; Muhammad Fahim; Oluwafemi Olukoya

Machine Learning with Applications (Dec 2023)

Voice spoofing detection for multiclass attack classification using deep learning

Jason Boyd,
Muhammad Fahim,
Oluwafemi Olukoya

Affiliations

Jason Boyd: School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, BT9 5BN, United Kingdom
Muhammad Fahim: School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, BT9 5BN, United Kingdom
Oluwafemi Olukoya: Corresponding author.; School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, BT9 5BN, United Kingdom

Journal volume & issue: Vol. 14
p. 100503

Abstract

Read online

Voice biometric authentication is increasingly gaining adoption in organisations with high-volume identity verifications and for providing access to physical and other virtual spaces. In this form of authentication, the user’s identity is verified with their voice. However, these systems are susceptible to voice spoofing attacks as malicious actors employ different types of attacks such as speech synthesis, voice conversion or imitations, and recorded replays to spoof the Automatic Speaker Verification (ASV) system or for spam communications. In this work, we provide a voice spoofing countermeasure as a binary classification problem, that classifies real and fake audio, and also as a multiclass classification problem to detect voice conversion, synthesis and replay attacks. We investigated numerous audio features and examined each feature capability alongside state-of-the-art deep learning algorithms including convolutional neural networks (CNN), WaveNet, and recurrent neural network variants — Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM) models. Using a large dataset of 419,426 audio files for experiments, we evaluated the deep learning models for their effectiveness against voice spoofing attacks. The binary class CNN achieved a false positive rate (FPR) of 0.0216, while the multiclass solutions using CNN, WaveNet, LSTMs and GRUs achieved an FPR of 0.003, 0.0260, 0.0302 and 0.0358 respectively. We extended the evaluation of the models by including the real-time classification using microphone voice audio and user-uploaded audio to demonstrate the practical implications and deployability.

Published in Machine Learning with Applications

ISSN: 2666-8270 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General): Cybernetics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.journals.elsevier.com/machine-learning-with-applications

About the journal

Abstract

Keywords