IEEE Access (Jan 2019)

Toward Robust Audio Spoofing Detection: A Detailed Comparison of Traditional and Learned Features

  • B. T. Balamurali,
  • Kinwah Edward Lin,
  • Simon Lui,
  • Jer-Ming Chen,
  • Dorien Herremans

DOI
https://doi.org/10.1109/ACCESS.2019.2923806
Journal volume & issue
Vol. 7
pp. 84229 – 84241

Abstract

Read online

Automatic speaker verification, such as every other biometric system, is vulnerable to spoofing attacks. Using only a few minutes of recorded voice from a genuine client of a speaker verification system, attackers can develop a variety of spoofing attacks that might trick such systems. Detecting these attacks using the audio cues present in the recordings is an important challenge. Most existing spoofing detection systems depend on knowing the used spoofing technique. With this research, we aim at overcoming this limitation, by examining robust audio features, both traditional and those learned through an autoencoder, which is generalizable to different types of replay spoofing. Furthermore, we provide a detailed account of all the steps necessary in setting up the state-of-the-art audio feature detection, preprocessing, and postprocessing, such that the (non-audio expert) machine learning researcher can implement such systems. Finally, we evaluate the performance of our robust replay spoofing detection system with a wide variety and different combinations of both extracted and machine-learned audio features on the “out in the wild” ASVspoof 2017 dataset. This dataset contains a variety of new replay spoofing configurations. Since our focus is on examining which features will ensure robustness, we base our system on a traditional Gaussian mixture model-universal background model (GMM-UBM). We then systematically investigate the relative contribution of each feature set. The fused models based on both the known audio features and the machine learned features, respectively, have a comparable performance with an equal error rate (EER) of 12. The final best performing model, which obtains an EER of 10.8, is a hybrid system that contains both known and machine-learned features and is trained on an augmented dataset, thus revealing the importance of incorporating both types of features when developing a robust spoofing prediction model.

Keywords