Journal of Intelligent Systems (Sep 2013)
Voice Activity Detection Algorithm Using Zero Frequency Filter Assisted Peaking Resonator and Empirical Mode Decomposition
Abstract
In this article, a new adaptive data-driven strategy for voice activity detection (VAD) using empirical mode decomposition (EMD) is proposed. Speech data are decomposed using an a posteriori, adaptive, data-driven EMD in the time domain to yield a set of physically meaningful intrinsic mode functions (IMFs). Each IMF preserves the nonlinear and nonstationary property of the speech utterance. Among a set of IMFs, the IMF that contains source information dominantly called characteristic IMF (CIMF) can be identified and extracted by designing a zero-frequency filter-assisted peaking resonator. The detected CIMF is used to compute energy using short-term processing. Choosing proper threshold, voiced regions in speech utterances are detected using frame energy. The proposed framework has been studied on both clean speech utterance and noisy speech utterance (0-dB white noise). The proposed method is used for voice activity detection (VAD) in the presence of white noise and shows encouraging result in the presence of white noise up to 0 dB.
Keywords