IEEE Access (Jan 2024)
Ubranch Conformer: Integrating Up-Down Sampling and Branch Attention for Speech Recognition
Abstract
Conformer has become one of the most popular models in the field of automatic speech recognition, achieving superior speech recognition performance by integrating a convolutional module into Transformer. However, existing Conformer models face challenges with high computational complexity in the attention mechanism when capturing global dependencies, and lack flexibility in dynamically adjusting local dependencies, which limits their efficiency in processing sequences. To address these challenges, we propose a new model called Ubranch Conformer, which combines Up-Down sampling and branch attention to enhance the ability to capture input information and reduce computational complexity. First, we designed a new architecture with an Up-Down sampling strategy that adjusts the sampling of embedded sequences between blocks in the model’s intermediate layers, reducing complexity without compromising efficiency. Second, we introduce a more efficient block structure that mainly contains branch attention and convolution gate modules, combining convolution with attention to improve the model’s ability and flexibility in capturing global and local dependencies. Experiments on multiple datasets show that our Ubranch Conformer achieves a more advanced level of performance compared to existing models, and achieves a good balance between performance and complexity.
Keywords