Dysarthric Speech Recognition Using Pseudo-Labeling, Self-Supervised Feature Learning, and a Joint Multi-Task Learning Approach

Ryoichi Takashima; Yuya Sawa; Ryo Aihara; Tetsuya Takiguchi; Yoshie Imai

doi:10.1109/ACCESS.2024.3374874

IEEE Access (Jan 2024)

Dysarthric Speech Recognition Using Pseudo-Labeling, Self-Supervised Feature Learning, and a Joint Multi-Task Learning Approach

Ryoichi Takashima,
Yuya Sawa,
Ryo Aihara,
Tetsuya Takiguchi,
Yoshie Imai

Affiliations

Ryoichi Takashima: ORCiD; Graduate School of System Informatics, Kobe University, Kobe, Japan
Yuya Sawa: Graduate School of System Informatics, Kobe University, Kobe, Japan
Ryo Aihara: ORCiD; Information Technology Research and Development Center, Mitsubishi Electric Corporation, Kamakura, Japan
Tetsuya Takiguchi: ORCiD; Graduate School of System Informatics, Kobe University, Kobe, Japan
Yoshie Imai: Information Technology Research and Development Center, Mitsubishi Electric Corporation, Kamakura, Japan

DOI: https://doi.org/10.1109/ACCESS.2024.3374874
Journal volume & issue: Vol. 12
pp. 36990 – 36999

Abstract

Read online

In this paper, we investigate the use of the spontaneous speech of dysarthric people for training an automatic speech recognition (ASR) model for them. Although the spontaneous speech of dysarthric people can be collected relatively easily compared to script-reading speech, which is obtained by having them read a prepared script, labeling the spontaneous speech of dysarthric people is very difficult and costly. For training an ASR model using unlabeled speech data, pseudo-labeling and self-supervised feature learning have been studied as effective approaches; however, the effectiveness of these approaches has not been clear when they are applied to the unlabeled dysarthric speech. In addition, pseudo-labeling may not be effective since the pseudo-labels of dysarthric speech include many errors and are not reliable. In this paper, we evaluate the above two approaches for the dysarthric speech recognition, and we propose a multi-task learning approach, which combines these approaches to train an ASR model that is robust against the errors in the pseudo-labels. Experimental results using Japanese and English datasets demonstrated that all approaches are effective, but among them, the proposed multi-task learning approach showed the best performance.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords