IEEE Access (Jan 2020)
Analysis of Linguistic and Prosodic Features of Bilingual Arabic–English Speakers for Speech Emotion Recognition
Abstract
Speech emotion recognition (SER) research has usually focused on the analysis of the native language of speakers, most commonly, targeting European and Asian languages. In the present study, a bilingual Arabic/English speech emotion database elicited from 16 male and 16 female Egyptian participants was created in order to investigate how the linguistic and prosodic features were affected by the anger, fear, happiness and sadness emotions across Arabic and English emotional speech. The results of the linguistic analysis indicated that the participants preferred to express their emotions indirectly, mainly using religious references, and that the female participants tended to use language that was more tentative and emotionally expressive, while the male participants tended to use language that was more assertive and independent. As for the prosodic analysis, statistical t-tests showed that the prosodic features of pitch, intensity and speech rate were more indicative of anger and happiness while less relevant to fear and scarcely significant for sadness. Furthermore, speech emotion recognition performed using linear support vector machine (SVM) with AdaBoost also supported these results. In regard to first and second language linguistic features, there was no significant difference in the choice of words and structures expressing the different emotions in the two languages, but in terms of prosodic features, the females' speech showed higher pitch in Arabic in all cases while both genders showed close intensity values in the two languages and faster speech rate in Arabic than in English.
Keywords