Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss

Labib Chowdhury; Hasib Zunair; Nabeel Mohammed

doi:10.3390/app10217522

Applied Sciences (Oct 2020)

Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss

Labib Chowdhury,
Hasib Zunair,
Nabeel Mohammed

Affiliations

Labib Chowdhury: Department of Electrical & Computer Engineering, North South University, Bashundhara, Dhaka-1229, Bangladesh
Hasib Zunair: Gina Cody School of Engineering and Computer Science, Concordia University, Montreal, QC H3G, Canada
Nabeel Mohammed: Department of Electrical & Computer Engineering, North South University, Bashundhara, Dhaka-1229, Bangladesh

DOI: https://doi.org/10.3390/app10217522
Journal volume & issue: Vol. 10, no. 21
p. 7522

Abstract

Read online

Speaker identification is gaining popularity, with notable applications in security, automation, and authentication. For speaker identification, deep-convolutional-network-based approaches, such as SincNet, are used as an alternative to i-vectors. Convolution performed by parameterized sinc functions in SincNet demonstrated superior results in this area. This system optimizes softmax loss, which is integrated in the classification layer that is responsible for making predictions. Since the nature of this loss is only to increase interclass distance, it is not always an optimal design choice for biometric-authentication tasks such as face and speaker recognition. To overcome the aforementioned issues, this study proposes a family of models that improve upon the state-of-the-art SincNet model. Proposed models AF-SincNet, Ensemble-SincNet, and ALL-SincNet serve as a potential successor to the successful SincNet model. The proposed models are compared on a number of speaker-recognition datasets, such as TIMIT and LibriSpeech, with their own unique challenges. Performance improvements are demonstrated compared to competitive baselines. In interdataset evaluation, the best reported model not only consistently outperformed the baselines and current prior models, but also generalized well on unseen and diverse tasks such as Bengali speaker recognition.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords