RS-MSConvNet: A Novel End-to-End Pathological Voice Detection Model

Wongsathon Pathonsuwan; Khomdet Phapatanaburi; Prawit Buayai; Talit Jumphoo; Patikorn Anchuen; Monthippa Uthansakul; Peerapong Uthansakul

doi:10.1109/ACCESS.2022.3219606

IEEE Access (Jan 2022)

RS-MSConvNet: A Novel End-to-End Pathological Voice Detection Model

Wongsathon Pathonsuwan,
Khomdet Phapatanaburi,
Prawit Buayai,
Talit Jumphoo,
Patikorn Anchuen,
Monthippa Uthansakul,
Peerapong Uthansakul

Affiliations

Wongsathon Pathonsuwan: School of Telecommunication Engineering, Suranaree University of Technology, Nakhon Ratchasima, Thailand
Khomdet Phapatanaburi: ORCiD; Department of Telecommunication Engineering, Faculty of Engineering and Technology, Rajamangala University of Technology Isan (RMUTI), Nakhon Ratchasima, Thailand
Prawit Buayai: Graduate Faculty of Interdisciplinary Research, University of Yamanashi, Kofu, Japan
Talit Jumphoo: School of Telecommunication Engineering, Suranaree University of Technology, Nakhon Ratchasima, Thailand
Patikorn Anchuen: Navaminda Kasatriyadhiraj Royal Air Force Academy, Bangkok, Thailand
Monthippa Uthansakul: ORCiD; School of Telecommunication Engineering, Suranaree University of Technology, Nakhon Ratchasima, Thailand
Peerapong Uthansakul: ORCiD; School of Telecommunication Engineering, Suranaree University of Technology, Nakhon Ratchasima, Thailand

DOI: https://doi.org/10.1109/ACCESS.2022.3219606
Journal volume & issue: Vol. 10
pp. 120450 – 120461

Abstract

Read online

Recent studies have reported the success of multi-scale convolution neural network (MSConvNet) model for many classification applications due to its powerful ability of exploring multi-scale convolution block to extract multi-scale representations to make a detection. However, a new design based on MSConvNet for pathological voice detection has not been explored. In this paper, we propose RS-MSConvNet, a novel end-to-end MSConvNet model using raw speech for pathological voice detection. The main contribution of the proposed RS-MSConvNet method is to exploit the multi-scale convolution block, followed by spatial-temporal feature block, and fully connected layer as classification. In addition, to further improve accuracy performance, we propose a novel hybrid detection model by integrating the feature extraction ability of the RS-MSConvNet model and the classifier of support vector machine (SVM) method, called RS-MSConvNet-SVM model. The effectiveness of our proposed models is investigated using the TORGO database. The experimental results reveal that the RS-MSConvNet model outperforms other baseline methods in the speaker-independent task. Moreover and as compared to the RS-MSConvNet-SVM model, a further improved accuracy is obtained using the RS-MSConvNet-SVM model. These outcomes exhibit that our proposed models are useful for pathological voice detection.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords