IEEE Access (Jan 2021)
Investigation and Evaluation of Glottal Flow Waveform for Voice Pathology Detection
Abstract
Automatic voice pathology detection can provide objective estimation and prevention in the early stages of voice diseases. Glottal flow waveform directly reflects the state of glottal excitation. Extracting acoustic features from glottal source signals may contribute to the detection of pathological voice. To improve the performance of voice pathology detection, this article investigates the contribution of the glottal flow waveform for pathological voice detection by evaluating the classification result using features extracted from raw speech utterances and corresponding glottal flow waveforms. The individual feature sets used are extracted from raw or glottal voice utterances with identical parameter settings, which are openSMILE acoustic features, audio features computed by Moving Picture Experts Group-7 standard and classical glottal source features. In addition, a feature selection method in terms of the wrapper approach is used to combine the single features ranked by using the Fisher discrimination ratio. Voice pathology detection experiments were carried out using Random Forest. The best accuracies of 88.52% for the Saarbrücken Voice database and 100.00% for the Massachusetts Eye and Ear Infirmary database are achieved using the combined feature set extracted from the glottal source signal, with improvement of 0.44-3.13% in the accuracies obtained by using raw speech utterances. Compared to state-of-the-art methods, the proposed method achieves the highest accuracy for the Massachusetts Eye and Ear Infirmary database and an increase of 2.75-17.16% in detection accuracy compared to other conventional pipeline systems for the Saarbrücken Voice databse. The experimental results demonstrate that using glottal flow waveform as source signal can improve the performance of pathological voice detection.
Keywords