Speaker Verification Under Degraded Conditions Using Empirical Mode Decomposition Based Voice Activity Detection Algorithm

Rudramurthy M. S.; Prasad V. Kamakshi; Kumaraswamy R.

doi:10.1515/jisys-2013-0085

Journal of Intelligent Systems (Dec 2014)

Speaker Verification Under Degraded Conditions Using Empirical Mode Decomposition Based Voice Activity Detection Algorithm

Rudramurthy M. S.,
Prasad V. Kamakshi,
Kumaraswamy R.

Affiliations

Rudramurthy M. S.: Department of Information Science and Engineering, S.I.T., Tumkur 572 103, Karnataka State, India
Prasad V. Kamakshi: Department of Computer Science, JNTUH, Kukatpally, Hyderabad 500 085, A.P. State, India
Kumaraswamy R.: Department of Electronics and Communication Engineering, S.I.T., Tumkur 572 103, Karnataka State, India

DOI: https://doi.org/10.1515/jisys-2013-0085
Journal volume & issue: Vol. 23, no. 4
pp. 359 – 378

Abstract

Read online

The performance of most of the state-of-the-art speaker recognition (SR) systems deteriorates under degraded conditions, owing to mismatch between the training and testing sessions. This study focuses on the front end of the speaker verification (SV) system to reduce the mismatch between training and testing. An adaptive voice activity detection (VAD) algorithm using zero-frequency filter assisted peaking resonator (ZFFPR) was integrated into the front end of the SV system. The performance of this proposed SV system was studied under degraded conditions with 50 selected speakers from the NIST 2003 database. The degraded condition was simulated by adding different types of noises to the original speech utterances. The different types of noises were chosen from the NOISEX-92 database to simulate degraded conditions at signal-to-noise ratio levels from 0 to 20 dB. In this study, widely used 39-dimension Mel frequency cepstral coefficient (MFCC; i.e., 13-dimension MFCCs augmented with 13-dimension velocity and 13-dimension acceleration coefficients) features were used, and Gaussian mixture model–universal background model was used for speaker modeling. The proposed system’s performance was studied against the energy-based VAD used as the front end of the SV system. The proposed SV system showed some encouraging results when EMD-based VAD was used at its front end.

Published in Journal of Intelligent Systems

ISSN: 0334-1860 (Print); 2191-026X (Online)
Publisher: De Gruyter
Country of publisher: Poland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.degruyter.com/view/journals/jisys/jisys-overview.xml

About the journal

Abstract

Keywords