A Novel Convolutional Neural Network Model for Automatic Speaker Identification From Speech Signals

J. Arun Pandian; Ramkumar Thirunavukarasu; Evans Kotei

doi:10.1109/ACCESS.2024.3385858

IEEE Access (Jan 2024)

A Novel Convolutional Neural Network Model for Automatic Speaker Identification From Speech Signals

J. Arun Pandian,
Ramkumar Thirunavukarasu,
Evans Kotei

Affiliations

J. Arun Pandian: ORCiD; School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, India
Ramkumar Thirunavukarasu: ORCiD; School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, India
Evans Kotei: ORCiD; Department of Computer Science, Kumasi Technical University, Kumasi, Ghana

DOI: https://doi.org/10.1109/ACCESS.2024.3385858
Journal volume & issue: Vol. 12
pp. 51381 – 51394

Abstract

Read online

Since voice communication is the most frequent form of human communication, it is necessary to identify each speaker in a discussion. The development of advanced computer processing units, deep learning techniques and publicly available datasets prompt the need for proposing novel automated speaker recognition systems. This study presents a novel Convolutional Neural Network (CNN) called Five Convolutional Blocks-CNN (5C-CNN) for automatic speaker identification from speech signals. The proposed 5C-CNN model comprises five Convolutional blocks and one dense block for identifying the speaker using their English speech audio. The proposed framework was trained with English audio speech signals collected from Ten individuals. The study also applied different audio augmentation techniques like white Gaussian noise, shifting, stretching, stretching with high frequency, altering speed, and changing pitch values to introduce diversity in the dataset. HyperBand tuning technique was used to optimize the training parameters of the proposed approach. The proposed 5C-CNN achieved a classification accuracy of 99.34% on the test data of the proposed person speech dataset. Also, the proposed approach performed well when tested against a benchmark dataset namely THUYG-20 for better generalization with an accuracy of 95.43%. Both experimental investigations confirm the efficiency of the proposed technique in the audio classification task while comparing the existing approaches, and standard machine learning techniques.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords