Applied Sciences (Mar 2023)

Multi-Scale Feature Learning for Language Identification of Overlapped Speech

  • Zuhragvl Aysa,
  • Mijit Ablimit,
  • Askar Hamdulla

DOI
https://doi.org/10.3390/app13074235
Journal volume & issue
Vol. 13, no. 7
p. 4235

Abstract

Read online

Language identification is the front end of multilingual speech-processing tasks. The study aims to enhance the accuracy of language identification in complex acoustic environments by proposing a multi-scale feature extraction method. This method replaces the baseline feature extraction network with a multi-scale feature extraction network (SE-Res2Net-CBAM-BILSTM) to extract multi-scale features. A multilingual cocktail party dataset was simulated, and comparative experiments were conducted with various models. The experimental results show that the proposed model achieved language identification accuracies of 97.6% for an Oriental language dataset and 75% for a multilingual cocktail party dataset Furthermore, comparative experiments show that our model outperformed three other models in the accuracy, recall, and F1 values. Finally, a comparison of different loss functions shows that the model performance was better when using focal loss.

Keywords