IEEE Access (Jan 2024)
A Novel Convolutional Neural Network Model for Automatic Speaker Identification From Speech Signals
Abstract
Since voice communication is the most frequent form of human communication, it is necessary to identify each speaker in a discussion. The development of advanced computer processing units, deep learning techniques and publicly available datasets prompt the need for proposing novel automated speaker recognition systems. This study presents a novel Convolutional Neural Network (CNN) called Five Convolutional Blocks-CNN (5C-CNN) for automatic speaker identification from speech signals. The proposed 5C-CNN model comprises five Convolutional blocks and one dense block for identifying the speaker using their English speech audio. The proposed framework was trained with English audio speech signals collected from Ten individuals. The study also applied different audio augmentation techniques like white Gaussian noise, shifting, stretching, stretching with high frequency, altering speed, and changing pitch values to introduce diversity in the dataset. HyperBand tuning technique was used to optimize the training parameters of the proposed approach. The proposed 5C-CNN achieved a classification accuracy of 99.34% on the test data of the proposed person speech dataset. Also, the proposed approach performed well when tested against a benchmark dataset namely THUYG-20 for better generalization with an accuracy of 95.43%. Both experimental investigations confirm the efficiency of the proposed technique in the audio classification task while comparing the existing approaches, and standard machine learning techniques.
Keywords