Jurnal Infotel (Feb 2022)

Frequency Domain Analysis of MFCC Feature Extraction in Children’s Speech Recognition System

  • Risanuri Hidayat

DOI
https://doi.org/10.20895/infotel.v14i1.740
Journal volume & issue
Vol. 14, no. 1
pp. 30 – 36

Abstract

Read online

Abstract —The research on speech recognition systems currently focuses on the analysis of robust speech recognition systems. When the speech signals are combined with noise, the recognition system becomes distracted, struggling to identify the speech sounds. Therefore, the development of a robust speech recognition system continues to be carried out. The principle of a robust speech recognition system is to eliminate noise from the speech signals and restore the original information signals. In this paper, researchers conducted a frequency domain analysis on one stage of the Mel Frequency Cepstral Coefficients (MFCC) process, the Fast Fourier Transform (FFT), in children's speech recognition system. The FTT analysis in the feature extraction process determined the effect of frequency value characteristics utilized in the FFT output on the noise disruption. The analysis method was designed into three scenarios based on the value of the employed FFT points. The differences between scenarios were based on the number of shared FFT points. All FFT points were divided into four, three, and two parts in the first, second, and third scenarios, respectively. This study utilized children's speech data from the isolated TIDIGIT English digit corpus. As comparative data, the noise was added manually to simulate real-world conditions. The results showed that using a particular frequency portion following the scenario designed on MFCC affected the recognition system performance, which was relatively significant on the noisy speech data. The designed method in the scenario 3 (C1) version generated the highest accuracy, exceeded the accuracy of the conventional MFCC method. The average accuracy in the scenario 3 (C1) method increased by 1% more than all the tested noise types. Using various noise intensity values (SNR), the testing process indicates that scenario 3 (C1) generates a higher accuracy than conventional MFCC in all tested SNR values. It proves that the selection of specific frequency utilized in MFCC feature extraction significantly affects the recognition accuracy in a noisy speech.

Keywords