Optimized early fusion of handcrafted and deep learning descriptors for voice pathology detection and classification

Roohum Jegan; R. Jayagowri

Healthcare Analytics (Dec 2024)

Optimized early fusion of handcrafted and deep learning descriptors for voice pathology detection and classification

Roohum Jegan,
R. Jayagowri

Affiliations

Roohum Jegan: Correspondence to: Department of Artificial Intelligence and Data Science, SIES Graduate School of Technology, Nerul, Navi Mumbai, Maharashtra, 400706, India; Department of Electronics and Communication Engineering, BMS College of Engineering, Bengaluru, 560019, India
R. Jayagowri: Department of Electronics and Communication Engineering, BMS College of Engineering, Bengaluru, 560019, India

Journal volume & issue: Vol. 6
p. 100369

Abstract

Read online

This study presents an automated noninvasive voice disorder detection and classification approach using an optimized fusion of modified glottal source estimation and deep transfer learning neural network descriptors. A new set of modified descriptors based on a glottal source estimator and pre-trained Inception-ResNet-v2 convolutional neural network-based features are proposed for the speech disorder detection and classification task. The modified feature set is obtained using mel-cepstral coefficients, harmonic model, phase discrimination means, distortion deviation descriptors, conventional wavelet, and glottal source estimation features. Early descriptor-level fusion is employed in this study for performance enhancement-however, the fusion results in higher feature vector dimensionality. A nature-inspired slime mould algorithm is utilized to remove redundant and select the best discriminating features. Finally, the classification is performed using the K-nearest neighbor (KNN) classifier. The proposed algorithm was evaluated using extensive experiments with different feature combinations, with and without feature selection, and with two popular datasets: the Arabic Voice Pathology Database (AVPD) and the Saarbrucken Voice Database (SVD). We show that the proposed optimized fusion method attained an enhanced voice pathology detection accuracy of 98.46%, encompassing a wide spectrum of voice disorders on the SVD database. Furthermore, compared to traditional handcrafted and deep neural network-based techniques, the proposed method demonstrates competitive performance with fewer features.

Published in Healthcare Analytics

ISSN: 2772-4425 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.journals.elsevier.com/healthcare-analytics

About the journal

Abstract

Keywords