IEEE Access (Jan 2020)
Transfer Learning Algorithm for Enhancing the Unlabeled Speech
Abstract
To improve the generalization ability of speech enhancement algorithms for unlabeled noisy speech, a speech enhancement transfer learning model based on the feature-attention multi-kernel maximum mean discrepancy (FA-MK-MMD) is proposed. To obtain a representation of the shared subspace (the part related with clean speech in the feature extracted by shared encoder) between source domain (speech with known noise and labels) and target domain (speech with unknown noise and no labels), the algorithm takes MK-MMD as loss function for reducing distribution differences between these two domains, which could improve the adaptability to the unknown noise. Furthermore, considering that different noise have different influence on the representation of shared subspace, the attention mechanism is applied to feature dimension to screen out the information less polluted by noise, which is helpful for reconstructing the clean speech. In the term of speech with unknown noise and no labels, the experiments demonstrate that the proposed algorithm has improved the frequency-weighed segmental signal-to-noise ratio (fwsegSNR), the perceptual evaluation of the speech quality (PESQ) and the short time objective intelligibility (STOI) compared with the baseline algorithm.
Keywords