DS-GAU: Dual-sequences gated attention unit architecture for text-independent speaker verification

Tsung-Han Tsai; Tran Dang Khoa

Machine Learning with Applications (Sep 2023)

DS-GAU: Dual-sequences gated attention unit architecture for text-independent speaker verification

Tsung-Han Tsai,
Tran Dang Khoa

Affiliations

Tsung-Han Tsai: Correspondence to: Department of Electrical Engineering, National Central University, No. 300, Zhongda Rd., Zhongli District, Taoyuan City, 320, Taiwan, ROC.; Department of Electrical Engineering, National Central University, Taoyuan, Taiwan
Tran Dang Khoa: Department of Electrical Engineering, National Central University, Taoyuan, Taiwan

Journal volume & issue: Vol. 13
p. 100469

Abstract

Read online

Text-independent speaker verification provides people identified from their voice characteristics. In this paper, we propose a new method, Dual-Sequences Gate Attention Unit to improve the accuracy of a massive speaker verification system. Dual-Sequences Gate Attention Unit is based on the Gated Dual Attention Unit and the Gated Recurrent Unit. Two different inputs from the same source are the state pooling layer in the x-vector and the frame layer information in the x-vector. It is developed by applying the attention mechanism to the traditional Gated Recurrent Unit to enhance the learning ability of the x-vector system. The whole system follows the statistics pooling from each time-delay neural network layer of the x-vector baseline. It passes through the Dual-Sequences Gate Attention Unit layer to aggregate more information from the variant temporal context of input features while training at the frame level. We train our model on the Voxceleb2 and then evaluate the accuracy of Voxceleb1 and the Speakers in the Wild dataset for simulation. Finally, the system is compared with the x-vector, L-vector, and ETDNN-OPGRUs x-vector. There is an obvious improvement to our proposed method. Compared with the x-vector system, it shows that at least 17.5% on Voxceleb1 and 0.5% on Speakers in the Wild equal error rate improvement is achieved in the fusion system.

Published in Machine Learning with Applications

ISSN: 2666-8270 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General): Cybernetics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.journals.elsevier.com/machine-learning-with-applications

About the journal

Abstract

Keywords