Speech Recognition for Air Traffic Control Utilizing a Multi-Head State-Space Model and Transfer Learning

Haijun Liang; Hanwen Chang; Jianguo Kong

doi:10.3390/aerospace11050390

Aerospace (May 2024)

Speech Recognition for Air Traffic Control Utilizing a Multi-Head State-Space Model and Transfer Learning

Haijun Liang,
Hanwen Chang,
Jianguo Kong

Affiliations

Haijun Liang: College of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, China
Hanwen Chang: College of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, China
Jianguo Kong: College of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, China

DOI: https://doi.org/10.3390/aerospace11050390
Journal volume & issue: Vol. 11, no. 5
p. 390

Abstract

Read online

In the present study, a novel end-to-end automatic speech recognition (ASR) framework, namely, ResNeXt-Mssm-CTC, has been developed for air traffic control (ATC) systems. This framework is built upon the Multi-Head State-Space Model (Mssm) and incorporates transfer learning techniques. Residual Networks with Cardinality (ResNeXt) employ multi-layered convolutions with residual connections to augment the extraction of intricate feature representations from speech signals. The Mssm is endowed with specialized gating mechanisms, which incorporate parallel heads that acquire knowledge of both local and global temporal dynamics in sequence data. Connectionist temporal classification (CTC) is utilized in the context of sequence labeling, eliminating the requirement for forced alignment and accommodating labels of varying lengths. Moreover, the utilization of transfer learning has been shown to improve performance on the target task by leveraging knowledge acquired from a source task. The experimental results indicate that the model proposed in this study exhibits superior performance compared to other baseline models. Specifically, when pretrained on the Aishell corpus, the model achieves a minimum character error rate (CER) of 7.2% and 8.3%. Furthermore, when applied to the ATC corpus, the CER is reduced to 5.5% and 6.7%.

Published in Aerospace

ISSN: 2226-4310 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Motor vehicles. Aeronautics. Astronautics
Website: http://www.mdpi.com/journal/aerospace

About the journal

Abstract

Keywords