Journal of King Saud University: Computer and Information Sciences (Jul 2022)
Enhanced Indonesian Ethnic Speaker Recognition using Data Augmentation Deep Neural Network
Abstract
Speaker Recognition is a challenging topic in Speech Processing research area. The various models proposed have succeeded in achieving a fairly high level of accuracy in this research. However, the level of Speaker Recognition accuracy is not yet maximized because the small dataset is a problem that is still being faced at this time, causing overfitting and biased data samples. This work proposes a Data Augmentation strategy using Adding White Noise techniques, Pitch Shifting, and Time Stretching, which are processed using a Deep Neural Network to produce a new model in speaker recognition as an approach called as DA-DNN7L. The Data Augmentation approach is used as a solution to increase the limited data quantity of Indonesian ethnic speakers, while the seven layer DNN is an architecture that provides the best accuracy performance compared to other multilayer approach models, besides that the 7 layer approach used in several other studies achieves a high degree of accuracy. Research that has been carried out using the best performance seven-layer Deep Neural Network Data Augmentation strategy resulted in an accuracy rate of 99.76% and a loss of 0.05 in the 70%:30% split ratio and the addition of 400 augmentation data. After seeing the performance of this model, it can be concluded that Data Augmentation Deep Neural Network can improve the speaker's recognition performance using the Indonesian ethnic dataset.