Gender and region detection from human voice using the three-layer feature extraction method with 1D CNN

Mohammad Amaz Uddin; Refat Khan Pathan; Md Sayem Hossain; Munmun Biswas

doi:10.1080/24751839.2021.1983318

Journal of Information and Telecommunication (Jan 2022)

Gender and region detection from human voice using the three-layer feature extraction method with 1D CNN

Mohammad Amaz Uddin,
Refat Khan Pathan,
Md Sayem Hossain,
Munmun Biswas

Affiliations

Mohammad Amaz Uddin: BGC Trust University Bangladesh
Refat Khan Pathan: BGC Trust University Bangladesh
Md Sayem Hossain: BGC Trust University Bangladesh
Munmun Biswas: BGC Trust University Bangladesh

DOI: https://doi.org/10.1080/24751839.2021.1983318
Journal volume & issue: Vol. 6, no. 1
pp. 27 – 42

Abstract

Read online

Analysing the human voice has always been a challenge to the engineering society for various purposes such as product review, emotional state detection, developing AI, and much more. Two basic grounds of voice or speech analysis are to detect human gender and the geographical region based on accent. This study presents a three-layer feature extraction method from the raw human voice to detect the gender as male or female, as well as the region from where that voice belongs. Fundamental frequency, spectral entropy, spectral flatness, and mode frequency have been calculated in the first layer of feature extraction. On the other hand, Mel Frequency Cepstral Coefficient has been used to extract the features in the second layer and linear predictive coding in the third layer. Regular voice contains some noises which have been removed with multiple audio data filtering processes to get noise-free smooth data. Multi-Output-based 1D Convolutional Neural Network has been used to recognize gender and region from a combined dataset which consists of TIMIT, RAVDESS, and BGC datasets. The model has successfully predicted the gender with 93.01% and region with 97.07% accuracy. This method works better than usual state-of-the-art methods in separate datasets along with the combined dataset on both gender and region classification.

Published in Journal of Information and Telecommunication

ISSN: 2475-1839 (Print); 2475-1847 (Online)
Publisher: Taylor & Francis Group
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Telecommunication; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.tandfonline.com/journals/tjit

About the journal

Abstract

Keywords