Electronic Research Archive (Jul 2023)

Speech recognition of south China languages based on federated learning and mathematical construction

  • Weiwei Lai ,
  • Yinglong Zheng

DOI
https://doi.org/10.3934/era.2023255
Journal volume & issue
Vol. 31, no. 8
pp. 4985 – 5005

Abstract

Read online

As speech recognition technology continues to advance in sophistication and computer processing power, more and more recognition technologies are being integrated into a variety of software platforms, enabling intelligent speech processing. We create a comprehensive processing platform for multilingual resources used in business and security fields based on speech recognition and distributed processing technology. Based on the federated learning model, this study develops speech recognition and its mathematical model for languages in South China. It also creates a speech dataset for dialects in South China, which at present includes three dialects of Mandarin and Cantonese, Chaoshan and Hakka that are widely spoken in the Guangdong region. Additionally, it uses two data enhancement techniques—audio enhancement and spectrogram enhancement—for speech signal characteristics in order to address the issue of unequal label distribution in the dataset. With a macro-average F-value of 91.54% and when compared to earlier work in the field, experimental results show that this structure is combined with hyperbolic tangent activation function and spatial domain attention to propose a dialect classification model based on hybrid domain attention.

Keywords