Jisuanji kexue yu tansuo (Jul 2021)
Speaker Verification Combining Total Variability Space and Time Delay Neural Network
Abstract
Under the short utterance environment, the total variability space underestimates the distribution of speech probabilities, which leads to a decline in speaker verification performance. Aiming at the above problems, a method of enhancing speaker identity vectors based on total variability space and time delay neural network (TDNN) is proposed. The purpose is to learn the linear correlation between the total variability space and TDNN, extract the speaker embeddings and project them on the new space, and then combine them into a new speaker supervector in order to enhance speaker information. In the training phase, this method separately trains the total variability space and TDNN. It creates a new irrelevant speaker set, extracts the i-vector and x-vector from it and gets the projection matrix under canonical correlation analysis (CCA). In the registration and testing phase, the embeddings of the registration and testing speakers are extracted, mapped in a new space through the projection matrix, and then the combined vectors enhance the speaker identity information. Under the short registration utterance and short test utterance, the experiment shows that the fused new vector is significantly lower than the baseline i-vector, x-vector in equal error rate.
Keywords