An Impact of Narrowband Speech Codec Mismatch on a Performance of GMM-UBM Speaker Recognition over Telecommunication Channel

Jozef Polacky; Peter Pocta; Roman Jarina

doi:10.26552/com.C.2016.1.23-28

Communications (Feb 2016)

An Impact of Narrowband Speech Codec Mismatch on a Performance of GMM-UBM Speaker Recognition over Telecommunication Channel

Jozef Polacky,
Peter Pocta,
Roman Jarina

Affiliations

Jozef Polacky: Department of Telecommunications and Multimedia, Faculty of Electrical Engineering, University of Zilina, Slovakia
Peter Pocta: Department of Telecommunications and Multimedia, Faculty of Electrical Engineering, University of Zilina, Slovakia
Roman Jarina: Department of Telecommunications and Multimedia, Faculty of Electrical Engineering, University of Zilina, Slovakia

DOI: https://doi.org/10.26552/com.C.2016.1.23-28
Journal volume & issue: Vol. 18, no. 1
pp. 23 – 28

Abstract

Read online

The automatic identification of person's identity from their voice is a part of modern telecommunication services. In order to execute the identification task, speech signal has to be transmitted to a remote server. So a performance of the recognition/identification system can be influenced by various distortions that occur when transmitting speech signal through a communication channel. This paper studies an effect of telecommunication channel, particularly commonly used narrowband (NB) speech codecs in current telecommunication networks, on a performance of automatic speaker recognition in the context of a channel/codec mismatch between enrollment and test utterances. An influence of speech coding on speaker identification is assessed by using the reference GMM-UBM method. The results show that the partially mismatched scenario offers better results than the fully matched scenario when speaker recognition is done on speech utterances degraded by the different NB codecs. Moreover, deploying EVS and G.711 codecs in a training process of the recognition system provides the best success rate in the fully mismatched scenario. It should be noted here that the both EVS and G.711codecs offer the best speech quality among the codecs deployed in this study. This finding also fully corresponds with the finding presented by Janicki & Staroszczyk in [1] focusing on other speech codecs.

Published in Communications

ISSN: 1335-4205 (Print); 2585-7878 (Online)
Publisher: University of Žilina
Country of publisher: Slovakia
LCC subjects: Social Sciences: Transportation and communications; Technology: Engineering (General). Civil engineering (General): Transportation engineering
Website: https://komunikacie.uniza.sk/

About the journal

Abstract

Keywords