Healthcare Analytics (Dec 2024)

Optimized early fusion of handcrafted and deep learning descriptors for voice pathology detection and classification

  • Roohum Jegan,
  • R. Jayagowri

Journal volume & issue
Vol. 6
p. 100369

Abstract

Read online

This study presents an automated noninvasive voice disorder detection and classification approach using an optimized fusion of modified glottal source estimation and deep transfer learning neural network descriptors. A new set of modified descriptors based on a glottal source estimator and pre-trained Inception-ResNet-v2 convolutional neural network-based features are proposed for the speech disorder detection and classification task. The modified feature set is obtained using mel-cepstral coefficients, harmonic model, phase discrimination means, distortion deviation descriptors, conventional wavelet, and glottal source estimation features. Early descriptor-level fusion is employed in this study for performance enhancement-however, the fusion results in higher feature vector dimensionality. A nature-inspired slime mould algorithm is utilized to remove redundant and select the best discriminating features. Finally, the classification is performed using the K-nearest neighbor (KNN) classifier. The proposed algorithm was evaluated using extensive experiments with different feature combinations, with and without feature selection, and with two popular datasets: the Arabic Voice Pathology Database (AVPD) and the Saarbrucken Voice Database (SVD). We show that the proposed optimized fusion method attained an enhanced voice pathology detection accuracy of 98.46%, encompassing a wide spectrum of voice disorders on the SVD database. Furthermore, compared to traditional handcrafted and deep neural network-based techniques, the proposed method demonstrates competitive performance with fewer features.

Keywords