Journal of Engineering and Sustainable Development (Nov 2024)
A Enhanced Speech Command Recognition using Convolutional Neural Networks
Abstract
In recent years, the growing interest in automatic speech recognition (ASR) has been driven by its wide-ranging applications across various domains. Integrating speech recognition technologies into smart systems highlights the pivotal role of human-machine interaction. This study introduces a robust ASR system that leverages convolutional neural networks (CNNs) in conjunction with Mel-frequency cepstral coefficients (MFCCs). The model's architecture was improved by extensively examining hyperparameters, effectively recognizing ten different spoken commands. The model conducted training and evaluation using the Google Speech dataset, comprising 65,000 audio clips collected from a wide range of speakers across the globe. This dataset accurately represents the natural variations in speech found in real-world scenarios. The design comprises eight storage layers, encompassing convolutional and fully connected layers. It consists of a total of 183,345 weights and utilizes ReLU activation. It is worth mentioning that the average F1-score obtained during the training, validation, and testing stages is 99.06 %, 94.68%, and 95.27%, respectively. Furthermore, the proposed model exhibits about 1.3% improvement in experimental test accuracy over existing methods, confirming its effectiveness in real-world applications.
Keywords