DeltaVLAD: An efficient optimization algorithm to discriminate speaker embedding for text-independent speaker verification

Xin Guo; Chengfang Luo; Aiwen Deng; Feiqi Deng

doi:10.3934/math.2022355

AIMS Mathematics (Jan 2022)

DeltaVLAD: An efficient optimization algorithm to discriminate speaker embedding for text-independent speaker verification

Xin Guo ,
Chengfang Luo,
Aiwen Deng,
Feiqi Deng

Affiliations

Xin Guo: 1. Guangdong Communication Polytechnic, Guangzhou 510650, China
Chengfang Luo: 2. School of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, China
Aiwen Deng: 2. School of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, China
Feiqi Deng: 2. School of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, China

DOI: https://doi.org/10.3934/math.2022355
Journal volume & issue: Vol. 7, no. 4
pp. 6381 – 6395

Abstract

Read online

Text-independent speaker verification aims to determine whether two given utterances in open-set task originate from the same speaker or not. In this paper, some ways are explored to enhance the discrimination of embeddings in speaker verification. Firstly, difference is used in the coding layer to process speaker features to form the DeltaVLAD layer. The frame-level speaker representation is extracted by the deep neural network with differential operations to calculate the dynamic changes between frames, which is more conducive to capturing insignificant changes in the voiceprint. Meanwhile, NeXtVLAD is adopted to split the frame-level features into multiple word spaces before aggregating, and subsequently perform VLAD operations in each subspace, which can significantly reduce the number of parameters and improve performance. Secondly, the margin-based softmax loss function and the few-shot learning-based loss function are proposed to be combined for more discriminative speaker embeddings. Finally, for a fair comparison, the experimental results are performed on Voxceleb-1 showing superior performance of speaker verification system and can obtain new state-of-the-art results.

Published in AIMS Mathematics

ISSN: 2473-6988 (Online)
Publisher: AIMS Press
Country of publisher: United States
LCC subjects: Science: Mathematics
Website: http://www.aimspress.com/journal/Math

About the journal

Abstract

Keywords