Mel-Scale Frequency Extraction and Classification of Dialect-Speech Signals With 1D CNN Based Classifier for Gender and Region Recognition

Hsiang-Yueh Lai; Chia-Chieh Hu; Chia-Hung Wen; Jian-Xing Wu; Neng-Sheng Pai; Cheng-Yu Yeh; Chia-Hung Lin

doi:10.1109/ACCESS.2024.3430296

IEEE Access (Jan 2024)

Mel-Scale Frequency Extraction and Classification of Dialect-Speech Signals With 1D CNN Based Classifier for Gender and Region Recognition

Hsiang-Yueh Lai,
Chia-Chieh Hu,
Chia-Hung Wen,
Jian-Xing Wu,
Neng-Sheng Pai,
Cheng-Yu Yeh,
Chia-Hung Lin

Affiliations

Hsiang-Yueh Lai: Department of Electrical Engineering, National Chin-Yi University of Technology, Taichung City, Taiwan
Chia-Chieh Hu: Department of Electrical Engineering, National Chin-Yi University of Technology, Taichung City, Taiwan
Chia-Hung Wen: Department of Electrical Engineering, National Chin-Yi University of Technology, Taichung City, Taiwan
Jian-Xing Wu: ORCiD; Department of Electrical Engineering, National Chin-Yi University of Technology, Taichung City, Taiwan
Neng-Sheng Pai: ORCiD; Department of Electrical Engineering, National Chin-Yi University of Technology, Taichung City, Taiwan
Cheng-Yu Yeh: Department of Electrical Engineering, National Chin-Yi University of Technology, Taichung City, Taiwan
Chia-Hung Lin: ORCiD; Department of Electrical Engineering, National Chin-Yi University of Technology, Taichung City, Taiwan

DOI: https://doi.org/10.1109/ACCESS.2024.3430296
Journal volume & issue: Vol. 12
pp. 102962 – 102976

Abstract

Read online

Humans communicate and interact through natural languages, such as American English (AE), Taiwanese, Italian, and numerous variants of Spanish. Through automatic speech analysis and recognition technologies, human-machine interaction systems (HMISs) can be used for language learning in query systems, smart devices, and healthcare applications, emphasizing the need to enhance user interaction across different sectors. Because people differ in their basic attributes (e.g., gender, age group, and spoken dialect), an HMIS must be able to identify the speaker’s gender, age group, and regional dialect on the basis of their speech signals. To achieve automatic speech recognition, we analyzed and distinguished feature patterns using a feature extraction method and identified gender and region using a convolutional neural network (CNN)-based classifier. Mel-frequency cepstral coefficients were used to extract Mel-scale frequencies (MSF) from dialect-sentence speech signals for conversion into specific feature patterns. Subsequently, a one-dimensional CNN-based classifier was used to identify these features patterns by gender and regional dialect. The proposed speech classifier was rigorously trained, tested, and validated using dialect-sentence speech corpora from AE, Italian (IT), and Spanish (SP) acoustic-phonetic continuous speech database. The experimental results indicate that the proposed model with MSF features can perform accurate gender and region recognition. The classifier was evaluated in metrics of precision (%), recall (%), F1 score, and accuracy (%).

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords