Significance of relative phase features for shouted and normal speech classification

Khomdet Phapatanaburi; Longbiao Wang; Meng Liu; Seiichi Nakagawa; Talit Jumphoo; Peerapong Uthansakul

doi:10.1186/s13636-023-00324-4

EURASIP Journal on Audio, Speech, and Music Processing (Jan 2024)

Significance of relative phase features for shouted and normal speech classification

Khomdet Phapatanaburi,
Longbiao Wang,
Meng Liu,
Seiichi Nakagawa,
Talit Jumphoo,
Peerapong Uthansakul

Affiliations

Khomdet Phapatanaburi: Department of Telecommunication Engineering, Faculty of Engineering and Technology, Rajamangala University of Technology Isan
Longbiao Wang: Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University
Meng Liu: Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University
Seiichi Nakagawa: Faculty of Engineering, Chubu University, Kasugai
Talit Jumphoo: School of Telecommunication Engineering, Suranaree University of Technology
Peerapong Uthansakul: School of Telecommunication Engineering, Suranaree University of Technology

DOI: https://doi.org/10.1186/s13636-023-00324-4
Journal volume & issue: Vol. 2024, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Shouted and normal speech classification plays an important role in many speech-related applications. The existing works are often based on magnitude-based features and ignore phase-based features, which are directly related to magnitude information. In this paper, the importance of phase-based features is explored for the detection of shouted speech. The novel contributions of this work are as follows. (1) Three phase-based features, namely, relative phase (RP), linear prediction analysis estimated speech-based RP (LPAES-RP) and linear prediction residual-based RP (LPR-RP) features, are explored for shouted and normal speech classification. (2) We propose a new RP feature, called the glottal source-based RP (GRP) feature. The main idea of the proposed GRP feature is to exploit the difference between RP and LPAES-RP features to detect shouted speech. (3) A score combination of phase- and magnitude-based features is also employed to further improve the classification performance. The proposed feature and combination are evaluated using the shouted normal electroglottograph speech (SNE-Speech) corpus. The experimental findings show that the RP, LPAES-RP, and LPR-RP features provide promising results for the detection of shouted speech. We also find that the proposed GRP feature can provide better results than those of the standard mel-frequency cepstral coefficient (MFCC) feature. Moreover, compared to using individual features, the score combination of the MFCC and RP/LPAES-RP/LPR-RP/GRP features yields an improved detection performance. Performance analysis under noisy environments shows that the score combination of the MFCC and the RP/LPAES-RP/LPR-RP features gives more robust classification. These outcomes show the importance of RP features in distinguishing shouted speech from normal speech.

Published in EURASIP Journal on Audio, Speech, and Music Processing

ISSN: 1687-4722 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Science: Physics: Acoustics. Sound; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://asmp-eurasipjournals.springeropen.com

About the journal

Abstract

Keywords