Zhihui kongzhi yu fangzhen (Apr 2023)

Deep ATC speaker recognition based on voiceprint aggregation

  • LI Yin-xuan, TANG Wen-yi, YANG Tao, WANG Xue-chuan, LI Cheng-xiang

DOI
https://doi.org/10.3969/j.issn.1673-3819.2023.02.018
Journal volume & issue
Vol. 42, no. 2
pp. 112 – 115

Abstract

Read online

For the problem of ATC speaker recognition, a method based on voiceprint feature aggregation is proposed, which could distinguish different speakers from an audio stream. First, we develop the ResNet spectrogram feature extractor and the NetVLAD feature fusion module, both of which seldom used in speaker recognition. Second, we insert two modules above and develop a novel end-to-end speaker recognition framework deriving from classic X-VECTORS method. Finally, the accuracy of the proposed method and the baseline method is compared and analyzed under a real ATC voice dataset. The results show that, compared with X-VECTORS network, the voiceprint aggregation method has superior recognition accuracy.

Keywords