How many Mel‐frequency cepstral coefficients to be utilized in speech recognition? A study with the Bengali language

Md. Rakibul Hasan; Md. Mahbub Hasan; Md Zakir Hossain

doi:10.1049/tje2.12082

The Journal of Engineering (Dec 2021)

How many Mel‐frequency cepstral coefficients to be utilized in speech recognition? A study with the Bengali language

Md. Rakibul Hasan,
Md. Mahbub Hasan,
Md Zakir Hossain

Affiliations

Md. Rakibul Hasan: Department of Electrical and Electronic Engineering Khulna University of Engineering & Technology Khulna Bangladesh
Md. Mahbub Hasan: Department of Electrical and Electronic Engineering Khulna University of Engineering & Technology Khulna Bangladesh
Md Zakir Hossain: Agriculture and Food, ML&AI FSP Commonwealth Scientific and Industrial Research Organisation Black Mountain Canberra Australia

DOI: https://doi.org/10.1049/tje2.12082
Journal volume & issue: Vol. 2021, no. 12
pp. 817 – 827

Abstract

Read online

Abstract Speech‐related research has a wide range of applications. Most speech‐related researches employ Mel‐frequency cepstral coefficients (MFCCs) as acoustic features. However, finding the optimum number of MFCCs is an active research question. MFCC‐based speech classification was performed for both vowels and words in the Bengali language. As for the classification model, deep neural network (DNN) with Adam optimizer was used. The performances were measured with five different performance metrics, namely confusion matrix, classification accuracy, area under curve of receiver operating characteristic (AUC‐ROC), F1 score, and Cohen's Kappa with four‐fold cross‐validations at different number of MFCCs. All performance metrics gave the best score for 24/25 MFCCs; hence it is suggested that the optimum number of MFCCs should be 25, although many existing studies use only 13 MFCCs. Furthermore, it is verified that increasing the number of MFCCs yields better classification metrics with lower computational burden than the increment of hidden layers. Lastly, the optimum number of MFCCs obtained from this study was used in a more improved DNN model, from which 99% and 90% accuracies were achieved for vowel and word classification, respectively, and the vowel classification score outperformed state‐of‐the‐art results.

Published in The Journal of Engineering

ISSN: 2051-3305 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Engineering (General). Civil engineering (General)
Website: https://ietresearch.onlinelibrary.wiley.com/journal/20513305

About the journal