IEEE Access (Jan 2024)

Metric-Based Few-Shot Transfer Learning Approach for Voice Pathology Detection

  • Jong-Ho Won,
  • Deok-Hwan Kim

DOI
https://doi.org/10.1109/ACCESS.2024.3480523
Journal volume & issue
Vol. 12
pp. 159226 – 159238

Abstract

Read online

Voice pathologies significantly affect social interactions and quality of life. Traditional diagnostic techniques, such as a laryngoscopy, are invasive and cumbersome. This paper proposes a noninvasive deep learning approach that utilizes voice signal analysis for diagnosis and reduces discomfort and examination time. The main contributions of this study include the implementation of few-shot learning (FSL) in voice pathology detection, addressing data scarcities and class imbalances through its integration with transfer learning. The study employs a metric-based FSL approach to enhance model generalization with limited data and validates this method using both voice and electroglottographic (EGG) signals, thereby improving the diagnostic accuracy and clinical applicability. The effectiveness of the proposed method in voice pathology detection has been demonstrated through experiments using voice and EGG signals, outperforming current FSL algorithms. Using a metric-based FSL framework, test accuracies of 73.7% and 82.6% for voice and EGG data, respectively, have been achieved, highlighting the robustness of the method, particularly for more informative EGG signals. Future work includes further development and validation in real clinical settings.

Keywords