EURASIP Journal on Audio, Speech, and Music Processing (Dec 2024)
Domain-weighted transfer learning and discriminative embeddings for low-resource speaker verification
Abstract
Abstract Transfer learning has been shown to be effective in enhancing speaker verification performance in low-resource conditions. However, the inclusion of additional datasets may cause domain mismatch. Additionally, mismatched data volume and model complexity during fine-tuning can degrade speaker verification performance. In this paper, we propose a domain-weighted allocation fine-tuning strategy that employs the Kernel Mean Matching (KMM) algorithm to adjust the distribution differences between the in-domain and out-of-domain datasets. It assigns weights to each sample in the source datasets and utilizes the maximum mean discrepancy (MMD) distance to measure the effectiveness of distribution adaptation. The domain-weighted allocation fine-tuning strategy (DWA-FT) effectively mitigates the issue of domain mismatch during model training. We also propose two backend canonical correlation analysis (CCA) embedding transformation methods, the CCA embedding fusion and the CCA embedding constraint. These methods aim to enhance the quality of speaker embeddings. The experimental results demonstrate that the proposed methods effectively enhance the performance of the speaker verification system in low-resource scenarios. Compared to the baseline, our methods achieve relative improvements of 51.03% in PLDA scoring and 46.02% in cosine similarity scoring on the Himia dataset.
Keywords