Pseudo-Phoneme Label Loss for Text-Independent Speaker Verification

Mengqi Niu; Liang He; Zhihua Fang; Baowei Zhao; Kai Wang

doi:10.3390/app12157463

Applied Sciences (Jul 2022)

Pseudo-Phoneme Label Loss for Text-Independent Speaker Verification

Mengqi Niu,
Liang He,
Zhihua Fang,
Baowei Zhao,
Kai Wang

Affiliations

Mengqi Niu: School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
Liang He: School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
Zhihua Fang: School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
Baowei Zhao: School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
Kai Wang: State Grid Xinjiang Electric Power Co., Ltd., Urumqi 830002, China

DOI: https://doi.org/10.3390/app12157463
Journal volume & issue: Vol. 12, no. 15
p. 7463

Abstract

Read online

Compared with text-independent speaker verification (TI-SV) systems, text-dependent speaker verification (TD-SV) counterparts often have better performance for their efficient utilization of speech content information. On this account, some TI-SV methods tried to boost performance by incorporating an extra automatic speech recognition (ASR) component to explore content information, such as c-vector. However, the introduced ASR component requires a large amount of annotated data and consumes high computation resources. In this paper, we propose a pseudo-phoneme label (PPL) loss for the TI-SR task by integrating content cluster loss at the frame level and speaker recognition loss at the segment level in a unified network by multitask learning, without additional data requirement and exhausting computation. By referring to HuBERT, we generate pseudo-phoneme labels to adjust a frame level feature distribution by deep cluster to ensure each cluster corresponds to an implicit pronunciation unit in the feature space. We compare the proposed loss with the softmax loss, center loss, triplet loss, log-likelihood-ratio cost loss, additive margin softmax loss and additive angular margin loss on the VoxCeleb database. Experimental results demonstrate the effectiveness of our proposed method.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords