IEEE Access (Jan 2025)

A Novel Stacked Model for Classification of Vocal Cord Paralysis Over Imbalanced Vocal Data

  • K. Jayashree Hegde,
  • K. Manjula Shenoy,
  • K. Devaraja

DOI
https://doi.org/10.1109/ACCESS.2025.3525721
Journal volume & issue
Vol. 13
pp. 10559 – 10581

Abstract

Read online

Over time, many classification systems have been developed for voice-related disorders using machine learning methods and limited usage of deep learning techniques. These systems were evaluated across accuracy, F1-score, precision, and recall using the Mel-Frequency Cepstral Coefficient (MFCC), time-domain features, etc. The aforementioned gave adequate results over the dataset of voice recordings of the vowel that either have a balanced dataset across all the classes or multiple voice pathologies are selected to bring the balance in the dataset equal to healthy subjects. In real-world scenarios, anticipating imbalance and a small amount of data is often associated with voice disorders. Vocal Cord Paralysis is one such voice pathology with limited data. In this paper, the proposed stacked ensemble model, InceptionV3-EfficientNetB0-ViT-B/16, is employed to classify Vocal Cord Paralysis (VCP) and healthy subjects over an imbalanced dataset in hand using spectrograms as a feature. Voice samples from the Saarbruecken Voice Database (SVD) for healthy and VCP are selected of the vowels /a/, /i/, and /u/ over neutral, high, low, and low-high-low pitch conditions and the phrase. Further, using the Short-time Fourier Transform (STFT), the voice samples are preprocessed, and each sample is augmented at various frequencies. The results from the experiments express that the proposed stacked model achieved an excellent accuracy of 94.11% for the vowel /a/ at normal and low-high-low pitch conditions using an imbalanced dataset. In addition, the proposed model’s robustness and trustworthiness are proven by the False Discovery Rate of 0.07142, Cohen Kappa of 0.82105, Mathew’s Correlation coefficient (MCC) of 0.83452, and F1-score 0.91005. The vowels /i/ and /u/, were also evaluated over the proposed model, and 88.23% accuracy is procured over most pitch conditions for the vowels and 90% for the phrase. Overall, the proposed method exhibited a powerful and successful capability for diagnosis throughout an unbalanced dataset without overtly favoring the majority class of healthy individuals and maintained an adequate balance in precisely recognizing the minority class VCP.

Keywords