Cons-KD: Dropout-Robust Knowledge Distillation for CTC-Based Automatic Speech Recognition

Ji Won Yoon; Hyeonseung Lee; Ju Yeon Kang; Nam Soo Kim

doi:10.1109/ACCESS.2024.3457859

IEEE Access (Jan 2024)

Cons-KD: Dropout-Robust Knowledge Distillation for CTC-Based Automatic Speech Recognition

Ji Won Yoon,
Hyeonseung Lee,
Ju Yeon Kang,
Nam Soo Kim

Affiliations

Ji Won Yoon: ORCiD; Department of Artificial Intelligence, Chung-Ang University, Seoul, South Korea
Hyeonseung Lee: XL8 Inc., Seoul, South Korea
Ju Yeon Kang: ORCiD; Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea
Nam Soo Kim: ORCiD; Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea

DOI: https://doi.org/10.1109/ACCESS.2024.3457859
Journal volume & issue: Vol. 12
pp. 131136 – 131146

Abstract

Read online

In recent years, there has been a growing interest in applying knowledge distillation (KD) techniques to the connectionist temporal classification (CTC) framework for training more efficient speech recognition models. Although conventional KD approaches have successfully reduced computational burden, they have limitations in dealing with the inconsistency problem caused by dropout regularization, particularly the gap between the training and inference stages. In the context of KD, this inconsistency may hinder the performance improvement of the student model. To overcome this issue, we propose a novel approach, namely Cons-KD, that combines KD and consistency regularization, where the former trains the student model to benefit from the knowledge of the teacher model, and the latter trains the student model to be more robust to the dropout-induced inconsistency. By directly mitigating the inconsistency problem, our KD framework can further improve the student’s performance compared to the vanilla KD. Experimental results on the LibriSpeech dataset demonstrate that Cons-KD significantly outperforms previous KD methods, improving the word error rate (WER) from 5.10 % to 4.13 % on the test-clean subset and from 12.87 % to 10.32 % on the test-other subset, respectively. These improvements correspond to relative error rate reduction (RERR) of 19.02 % and 19.81 %, respectively, implying notable advancements beyond conventional KD methods. Additionally, we conduct an in-depth analysis to verify the effect of each proposed objective.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords