Enhanced Forensic Speaker Verification Using a Combination of DWT and MFCC Feature Warping in the Presence of Noise and Reverberation Conditions

Ahmed Kamil Hasan Al-Ali; David Dean; Bouchra Senadji; Vinod Chandran; Ganesh R. Naik

doi:10.1109/ACCESS.2017.2728801

IEEE Access (Jan 2017)

Enhanced Forensic Speaker Verification Using a Combination of DWT and MFCC Feature Warping in the Presence of Noise and Reverberation Conditions

Ahmed Kamil Hasan Al-Ali,
David Dean,
Bouchra Senadji,
Vinod Chandran,
Ganesh R. Naik

Affiliations

Ahmed Kamil Hasan Al-Ali: ORCiD; Queensland University of Technology, Brisbane, QLD, Australia
David Dean: Queensland University of Technology, Brisbane, QLD, Australia
Bouchra Senadji: Queensland University of Technology, Brisbane, QLD, Australia
Vinod Chandran: Queensland University of Technology, Brisbane, QLD, Australia
Ganesh R. Naik: ORCiD; MARCS Institute, Western Sydney University, Sydney, NSW, Australia

DOI: https://doi.org/10.1109/ACCESS.2017.2728801
Journal volume & issue: Vol. 5
pp. 15400 – 15413

Abstract

Read online

Environmental noise and reverberation conditions severely degrade the performance of forensic speaker verification. Robust feature extraction plays an important role in improving forensic speaker verification performance. This paper investigates the effectiveness of combining features, mel frequency cepstral coefficients (MFCCs), and MFCC extracted from the discrete wavelet transform (DWT) of the speech, with and without feature warping for improving modern identity-vector (i-vector)-based speaker verification performance in the presence of noise and reverberation. The performance of i-vector speaker verification was evaluated using different feature extraction techniques: MFCC, feature-warped MFCC, DWT-MFCC, feature-warped DWT-MFCC, a fusion of DWT-MFCC and MFCC features, and fusion feature-warped DWT-MFCC and feature-warped MFCC features. We evaluated the performance of i-vector speaker verification using the Australian Forensic Voice Comparison and QUT-NOISE databases in the presence of noise, reverberation, and noisy and reverberation conditions. Our results indicate that the fusion of feature-warped DWT-MFCC and feature-warped MFCC is superior to other feature extraction techniques in the presence of environmental noise under the majority of signal-to-noise ratios (SNRs), reverberation, and noisy and reverberation conditions. At 0-dB SNR, the performance of the fusion of feature-warped DWT-MFCC and feature-warped MFCC approach achieves a reduction in average equal error rate of 21.33%, 20.00%, and 13.28% over feature-warped MFCC, respectively, in the presence of various types of environmental noises only, reverberation, and noisy and reverberation environments. The approach can be used for improving the performance of forensic speaker verification and it may be utilized for preparing legal evidence in court.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords