Speaker Identification in Different Emotional States in Arabic and English

Ali Hamid Meftah; Hassan Mathkour; Said Kerrache; Yousef Ajami Alotaibi

doi:10.1109/ACCESS.2020.2983029

IEEE Access (Jan 2020)

Speaker Identification in Different Emotional States in Arabic and English

Ali Hamid Meftah,
Hassan Mathkour,
Said Kerrache,
Yousef Ajami Alotaibi

Affiliations

Ali Hamid Meftah: ORCiD; College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Hassan Mathkour: College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Said Kerrache: College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Yousef Ajami Alotaibi: College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia

DOI: https://doi.org/10.1109/ACCESS.2020.2983029
Journal volume & issue: Vol. 8
pp. 60070 – 60083

Abstract

Read online

Speaker recognition is an important application of digital speech processing. However, a major challenge degrading the robustness of speaker-recognition systems is variation in the emotional states of speakers, such as happiness, anger, sadness, or surprise. In this paper, we propose a speaker recognition system corresponding to three states, namely emotional, neutral, and with no consideration for a speaker's state (i.e., the speaker can be in an emotional state or neutral state), for two languages: Arabic and English. Additionally, cross-language speaker recognition was applied in emotional, neutral, and (emotional + neutral) states. Convolutional neural network and long short-term memory models were used to design a convolutional recurrent neural network (CRNN) main system. We also investigated the use of linearly spaced spectrograms as speech-feature inputs. The proposed system utilizes the KSUEmotions, emotional prosody speech and transcripts, WEST POINT, and TIMIT corpora. The CRNN system exhibited accuracies as high as 97.4% and 97.18% for Arabic and English emotional speech inputs, respectively, and 99.89% and 99.4% for Arabic and English neutral speech inputs, respectively. For the cross-language program, the overall CRNN system accuracy was as high as 91.83%, 99.88%, and 95.36% for emotional, neutral, and (emotional + neutral) states, respectively.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords