Comparison of Modern Deep Learning Models for Speaker Verification

Vitalii Brydinskyi; Yuriy Khoma; Dmytro Sabodashko; Michal Podpora; Volodymyr Khoma; Alexander Konovalov; Maryna Kostiak

doi:10.3390/app14041329

Applied Sciences (Feb 2024)

Comparison of Modern Deep Learning Models for Speaker Verification

Vitalii Brydinskyi,
Yuriy Khoma,
Dmytro Sabodashko,
Michal Podpora,
Volodymyr Khoma,
Alexander Konovalov,
Maryna Kostiak

Affiliations

Vitalii Brydinskyi: Institute of Computer Technologies, Automation and Metrology, Lviv Polytechnic National University, Bandery 12, 79013 Lviv, Ukraine
Yuriy Khoma: Institute of Computer Technologies, Automation and Metrology, Lviv Polytechnic National University, Bandery 12, 79013 Lviv, Ukraine
Dmytro Sabodashko: Institute of Computer Technologies, Automation and Metrology, Lviv Polytechnic National University, Bandery 12, 79013 Lviv, Ukraine
Michal Podpora: Department of Computer Science, Opole University of Technology, Proszkowska 76, 45-758 Opole, Poland
Volodymyr Khoma: Institute of Computer Technologies, Automation and Metrology, Lviv Polytechnic National University, Bandery 12, 79013 Lviv, Ukraine
Alexander Konovalov: Vidby AG, Suurstoffi 8, 6343 Risch-Rotkreuz, Switzerland
Maryna Kostiak: Institute of Computer Technologies, Automation and Metrology, Lviv Polytechnic National University, Bandery 12, 79013 Lviv, Ukraine

DOI: https://doi.org/10.3390/app14041329
Journal volume & issue: Vol. 14, no. 4
p. 1329

Abstract

Read online

This research presents an extensive comparative analysis of a selection of popular deep speaker embedding models, namely WavLM, TitaNet, ECAPA, and PyAnnote, applied in speaker verification tasks. The study employs a specially curated dataset, specifically designed to mirror the real-world operating conditions of voice models as accurately as possible. This dataset includes short, non-English statements gathered from interviews on a popular online video platform. The dataset features a wide range of speakers, with 33 males and 17 females, making a total of 50 unique voices. These speakers vary in age from 20 to 70 years old. This variety helps in thoroughly testing speaker verification models. This dataset is especially useful for research on speaker verification with short recordings. It consists of 10 clips for each person, each clip being no longer than 10 s, adding up to 500 recordings in total. The total length of all recordings is about 1 h and 30 min, which averages to roughly 100 s for each speaker. This dataset is a valuable tool for research in speaker verification, particularly for studies involving short audio clips. The performance of these models is evaluated using common biometric metrics such as false acceptance rate (FAR), false rejection rate (FRR), equal error rate (EER) and detection cost function (DCF). The results reveal that the TitaNet and ECAPA models stand out by presenting the lowest EER (1.91% and 1.71%, respectively) and thus exhibiting higher discriminative features, ensuring, on the one hand, a reduction in intra-class distance (the same speaker), and, on the other hand, maximizing the distance between different speaker embeddings. This analysis also highlights the ECAPA model’s advantageous balance of performance and efficiency, achieving an inference time of 69.43 milliseconds, slightly longer than the PyAnnote models. This study not only compares the performance of models but also provides a comparative analysis of respective model embeddings, offering insights into their strengths and weaknesses. The presented findings serve as a foundation for guiding future research in speaker verification, especially in the context of short audio samples or limited data. This may be particularly relevant for applications requiring quick and accurate speaker identification from short voice clips.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords