Multi-modal voice pathology detection architecture based on deep and handcrafted feature fusion

Asli Nur Omeroglu; Hussein M.A. Mohammed; Emin Argun Oral

Engineering Science and Technology, an International Journal (Dec 2022)

Multi-modal voice pathology detection architecture based on deep and handcrafted feature fusion

Asli Nur Omeroglu,
Hussein M.A. Mohammed,
Emin Argun Oral

Affiliations

Asli Nur Omeroglu: Corresponding author.; Ataturk University, Department of Electrical and Electronics Engineering, Yakutiye, Erzurum 25240, Turkey
Hussein M.A. Mohammed: Ataturk University, Department of Electrical and Electronics Engineering, Yakutiye, Erzurum 25240, Turkey
Emin Argun Oral: Principal Corresponding author.; Ataturk University, Department of Electrical and Electronics Engineering, Yakutiye, Erzurum 25240, Turkey

Journal volume & issue: Vol. 36
p. 101148

Abstract

Read online

Automatic voice pathology detection systems can effectively help clinicians by enabling objective assessment and diagnosis in early stage of voice pathologies. This paper suggests a novel multi-modal architecture utilizing speech and electroglottography (EGG) signals and investigates their effectiveness in automatic detection of voice pathology. The proposed multi-modal framework combines two parallel Convolutional Neural Networks (CNNs), one for voice signals and the other for EGG signals, to obtain deep features. Classical handcrafted features are also obtained in the same manner. These features are then concatenated to obtain a more prominent feature set. In addition, a feature selection method is applied to remove redundant features. Finally, a SVM classifier is utilized to detect the voice pathology. In order to measure the performance of the proposed pathology detection system, various experiments are conducted on Saarbruecken Voice Database (SVD) without excluding any available pathology or sample. The experimental results show that the proposed voice pathology detection method achieves accuracy up to 90.10% using all speech and EGG samples. Also, sensitivity, specificity and F1-score results of 92.9%, 84.6% and 92.57% are obtained, respectively. The proposed method provides better performance than those given in the literature using all SVD samples through cross-validation testing. Hence, it is promising for automatic detection applications of voice pathology.

Published in Engineering Science and Technology, an International Journal

ISSN: 2215-0986 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Engineering (General). Civil engineering (General)
Website: http://www.journals.elsevier.com/engineering-science-and-technology-an-international-journal/

About the journal

Abstract

Keywords