Mehran University Research Journal of Engineering and Technology (Apr 2012)
Automatic Speaker Identification Using Clinically Depressed Speech Content
Abstract
The environment affects largely the performance of automatic speaker recognition. This work investigates the effects of clinical environment on the task of speaker recognition. For this task we have used two sets of speakers, a clinical set which consists of speech samples from 70 clinically depressed speakers and a control set which comprises of 68 clinically non-depressed speakers. The MFCCs (Mel Frequency Cepstral Coefficients) are applied for feature extraction, and a number of modeling methods such as GMM-EM (Gaussian Mixture Models Based on Expectation Maximization), GMM based on Kmeans (GMM-Kmeans), GMM-LBG based on Linde Buzo Gray, and GMM -ITVQ based on Information Theoretic Vector Quantization are used. The different modeling methods are evaluated for the novel speech corpus. The results suggest that the speaker recognition rates for the depressed speakers are lower (60-71%) than for the non-depressed speakers (79-89%). This paper further investigate the performance of VQ (Vector Quantization) based Gaussian modeling, and proposes a novel approach called GMM-ITVQ. The results suggest that GMM-EM has the higher recognition rates however, the performance of GMMITVQ is comparable to GMM-EM.