Enhancing Speaker Recognition Models with Noise-Resilient Feature Optimization Strategies

Neha Chauhan; Tsuyoshi Isshiki; Dongju Li

doi:10.3390/acoustics6020024

Acoustics (May 2024)

Enhancing Speaker Recognition Models with Noise-Resilient Feature Optimization Strategies

Neha Chauhan,
Tsuyoshi Isshiki,
Dongju Li

Affiliations

Neha Chauhan: Department of Information and Communication Engineering, Tokyo Institute of Technology, Tokyo 152-8550, Japan
Tsuyoshi Isshiki: Department of Information and Communication Engineering, Tokyo Institute of Technology, Tokyo 152-8550, Japan
Dongju Li: Department of Information and Communication Engineering, Tokyo Institute of Technology, Tokyo 152-8550, Japan

DOI: https://doi.org/10.3390/acoustics6020024
Journal volume & issue: Vol. 6, no. 2
pp. 439 – 469

Abstract

Read online

This paper delves into an in-depth exploration of speaker recognition methodologies, with a primary focus on three pivotal approaches: feature-level fusion, dimension reduction employing principal component analysis (PCA) and independent component analysis (ICA), and feature optimization through a genetic algorithm (GA) and the marine predator algorithm (MPA). This study conducts comprehensive experiments across diverse speech datasets characterized by varying noise levels and speaker counts. Impressively, the research yields exceptional results across different datasets and classifiers. For instance, on the TIMIT babble noise dataset (120 speakers), feature fusion achieves a remarkable speaker identification accuracy of 92.7%, while various feature optimization techniques combined with K nearest neighbor (KNN) and linear discriminant (LD) classifiers result in a speaker verification equal error rate (SV EER) of 0.7%. Notably, this study achieves a speaker identification accuracy of 93.5% and SV EER of 0.13% on the TIMIT babble noise dataset (630 speakers) using a KNN classifier with feature optimization. On the TIMIT white noise dataset (120 and 630 speakers), speaker identification accuracies of 93.3% and 83.5%, along with SV EER values of 0.58% and 0.13%, respectively, were attained utilizing PCA dimension reduction and feature optimization techniques (PCA-MPA) with KNN classifiers. Furthermore, on the voxceleb1 dataset, PCA-MPA feature optimization with KNN classifiers achieves a speaker identification accuracy of 95.2% and an SV EER of 1.8%. These findings underscore the significant enhancement in computational speed and speaker recognition performance facilitated by feature optimization strategies.

Published in Acoustics

ISSN: 2624-599X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Physics
Website: https://www.mdpi.com/journal/acoustics

About the journal

Abstract

Keywords