Advances in Electrical and Computer Engineering (May 2018)
Combination of Long-Term and Short-Term Features for Age Identification from Voice
Abstract
In this paper, we propose to use Gaussian mixture model (GMM) supervectors in a feed-forward deep neural network (DNN) for age identification from voice. The GMM is trained with short-term mel-frequency cepstral coefficients (MFCC). The proposed GMM/DNN method is compared with a feed-forward DNN and a recurrent neural network (RNN) in which the MFCC features are directly used. We also make a comparison with the classical GMM and GMM/support vector machine (SVM) methods. Baseline results are obtained with a set of long-term features which are commonly used for age identification in previous studies. A feed-forward DNN and an SVM are trained using the long term features. All the systems are tested using a speech database which consists of 228 female and 156 male speakers. We define three age classes for each gender; young, adult and senior. In the experiments, the proposed GMM/DNN significantly outperforms all the other DNN types. Its performance is only comparable to the GMM/SVM method. On the other hand, experimental results show that age identification performance is significantly improved when the decisions of the short-term and long-term systems are combined together. We obtain approximately 4% absolute improvement with the combination compared to the best standalone system.
Keywords