Deep ATC speaker recognition based on voiceprint aggregation

LI  Yin-xuan, TANG  Wen-yi, YANG  Tao, WANG  Xue-chuan, LI  Cheng-xiang

doi:10.3969/j.issn.1673-3819.2023.02.018

Zhihui kongzhi yu fangzhen (Apr 2023)

Deep ATC speaker recognition based on voiceprint aggregation

LI Yin-xuan, TANG Wen-yi, YANG Tao, WANG Xue-chuan, LI Cheng-xiang

Affiliations

LI Yin-xuan, TANG Wen-yi, YANG Tao, WANG Xue-chuan, LI Cheng-xiang: 1. Beijing Capital International Airport, Beijing 100621;2. State Key Laboratory of Air Traffc Management System and Technology of Nanjing Research Institute of Electronic Engineering,Nanjing 210007, China

DOI: https://doi.org/10.3969/j.issn.1673-3819.2023.02.018
Journal volume & issue: Vol. 42, no. 2
pp. 112 – 115

Abstract

Read online

For the problem of ATC speaker recognition, a method based on voiceprint feature aggregation is proposed, which could distinguish different speakers from an audio stream. First, we develop the ResNet spectrogram feature extractor and the NetVLAD feature fusion module, both of which seldom used in speaker recognition. Second, we insert two modules above and develop a novel end-to-end speaker recognition framework deriving from classic X-VECTORS method. Finally, the accuracy of the proposed method and the baseline method is compared and analyzed under a real ATC voice dataset. The results show that, compared with X-VECTORS network, the voiceprint aggregation method has superior recognition accuracy.

tdnn|feature aggregation|vlad|atc voice

Published in Zhihui kongzhi yu fangzhen

ISSN: 1673-3819 (Print)
Publisher: Editorial Office of Command Control and Simulation
Country of publisher: China
LCC subjects: Military Science
Website: https://www.zhkzyfz.cn/EN/home

About the journal

Abstract

Keywords