IEEE Access (Jan 2023)

Speech Enhancement Using Dynamic Learning in Knowledge Distillation via Reinforcement Learning

  • Shih-Chuan Chu,
  • Chung-Hsien Wu,
  • Tsai-Wei Su

DOI
https://doi.org/10.1109/ACCESS.2023.3343738
Journal volume & issue
Vol. 11
pp. 144421 – 144434

Abstract

Read online

In recent years, most of the research on speech enhancement (SE) has applied different strategies to improve performance through deep neural network models. However, as the performance improves, the memory resources and computational requirements of the model also increase, making it difficult to directly apply them to edge computing. Therefore, various model compression and acceleration techniques are desired. This paper proposes a learning method that dynamically uses Knowledge Distillation (KD) to teach a small student model from a large teacher model by considering the learning ratio from the teacher’s output and the real target based on reinforcement learning (RL). During the KD learning process, RL is adopted to estimate the learning ratio by considering the reward favoring the hard target (clean speech) or the soft target (the output of the teacher model) during the training of KD. The proposed method results in a more stable training process for the resulting smaller SE model and yields improved performance. In the experiment, we used the TIMIT and CSTR VCTK datasets and evaluated two representative SE models that employ different loss functions. On the TIMIT dataset, when we reduced the number of parameters in the Wave-U-Net student model from 10.3 million to 2.6 million, our method performed better than non-KD models with improvements of 0.05 in PESQ, 0.1 in STOI, and 0.47 in the scale-invariant signal-to-distortion ratio. Moreover, by utilizing prior knowledge from the pre-trained teacher model, our method effectively guided the learning process of the student model, achieving excellent performance even under low SNR conditions. Furthermore, we use Conv-Tasnet to further validate our proposed method. Finally, for ease of comparison, we conducted a comparison on the VCTK dataset as well.

Keywords