ARO-The Scientific Journal of Koya University (Sep 2024)

Time Series-Based Spoof Speech Detection Using Long Short-Term Memory and Bidirectional Long Short-Term Memory

  • Arsalan R. Mirza,
  • Abdulbasit K. Al-Talabani

DOI
https://doi.org/10.14500/aro.11636
Journal volume & issue
Vol. 12, no. 2

Abstract

Read online

Detecting fake speech in voice-based authentication systems is crucial for reliability. Traditional methods often struggle because they can't handle the complex patterns over time. Our study introduces an advanced approach using deep learning, specifically Long Short-Term Memory (LSTM) and Bidirectional LSTM (BiLSTM) models, tailored for identifying fake speech based on its temporal characteristics. We use speech signals with cepstral features like Mel-frequency cepstral coefficients (MFCC), Constant Q cepstral coefficients (CQCC), and open-source Speech and Music Interpretation by Large-space Extraction (OpenSMILE) to directly learn these patterns. Testing on the ASVspoof 2019 Logical Access dataset, we focus on metrics such as min-tDCF, Equal Error Rate (EER), Recall, Precision, and F1-score. Our results show that LSTM and BiLSTM models significantly enhance the reliability of spoof speech detection systems.

Keywords