IEEE Access (Jan 2022)

RS-MSConvNet: A Novel End-to-End Pathological Voice Detection Model

  • Wongsathon Pathonsuwan,
  • Khomdet Phapatanaburi,
  • Prawit Buayai,
  • Talit Jumphoo,
  • Patikorn Anchuen,
  • Monthippa Uthansakul,
  • Peerapong Uthansakul

DOI
https://doi.org/10.1109/ACCESS.2022.3219606
Journal volume & issue
Vol. 10
pp. 120450 – 120461

Abstract

Read online

Recent studies have reported the success of multi-scale convolution neural network (MSConvNet) model for many classification applications due to its powerful ability of exploring multi-scale convolution block to extract multi-scale representations to make a detection. However, a new design based on MSConvNet for pathological voice detection has not been explored. In this paper, we propose RS-MSConvNet, a novel end-to-end MSConvNet model using raw speech for pathological voice detection. The main contribution of the proposed RS-MSConvNet method is to exploit the multi-scale convolution block, followed by spatial-temporal feature block, and fully connected layer as classification. In addition, to further improve accuracy performance, we propose a novel hybrid detection model by integrating the feature extraction ability of the RS-MSConvNet model and the classifier of support vector machine (SVM) method, called RS-MSConvNet-SVM model. The effectiveness of our proposed models is investigated using the TORGO database. The experimental results reveal that the RS-MSConvNet model outperforms other baseline methods in the speaker-independent task. Moreover and as compared to the RS-MSConvNet-SVM model, a further improved accuracy is obtained using the RS-MSConvNet-SVM model. These outcomes exhibit that our proposed models are useful for pathological voice detection.

Keywords