AIMS Mathematics (Jan 2022)

DeltaVLAD: An efficient optimization algorithm to discriminate speaker embedding for text-independent speaker verification

  • Xin Guo ,
  • Chengfang Luo,
  • Aiwen Deng,
  • Feiqi Deng

DOI
https://doi.org/10.3934/math.2022355
Journal volume & issue
Vol. 7, no. 4
pp. 6381 – 6395

Abstract

Read online

Text-independent speaker verification aims to determine whether two given utterances in open-set task originate from the same speaker or not. In this paper, some ways are explored to enhance the discrimination of embeddings in speaker verification. Firstly, difference is used in the coding layer to process speaker features to form the DeltaVLAD layer. The frame-level speaker representation is extracted by the deep neural network with differential operations to calculate the dynamic changes between frames, which is more conducive to capturing insignificant changes in the voiceprint. Meanwhile, NeXtVLAD is adopted to split the frame-level features into multiple word spaces before aggregating, and subsequently perform VLAD operations in each subspace, which can significantly reduce the number of parameters and improve performance. Secondly, the margin-based softmax loss function and the few-shot learning-based loss function are proposed to be combined for more discriminative speaker embeddings. Finally, for a fair comparison, the experimental results are performed on Voxceleb-1 showing superior performance of speaker verification system and can obtain new state-of-the-art results.

Keywords