Priority-Encoder Ensemble for Speech Recognition

Dzeuban Fenyom Ivan; Daison Darlan; Adeyinka Adedigba; Oladayo S. Ajani; Rammohan Mallipeddi; Hwang Jae Joo

doi:10.1109/ACCESS.2024.3454221

IEEE Access (Jan 2024)

Priority-Encoder Ensemble for Speech Recognition

Dzeuban Fenyom Ivan,
Daison Darlan,
Adeyinka Adedigba,
Oladayo S. Ajani,
Rammohan Mallipeddi,
Hwang Jae Joo

Affiliations

Dzeuban Fenyom Ivan: ORCiD; Department of Artificial Intelligence, School of Electronics Engineering, Kyungpook National University, Daegu, South Korea
Daison Darlan: ORCiD; Department of Artificial Intelligence, School of Electronics Engineering, Kyungpook National University, Daegu, South Korea
Adeyinka Adedigba: ORCiD; Department of Artificial Intelligence, School of Electronics Engineering, Kyungpook National University, Daegu, South Korea
Oladayo S. Ajani: ORCiD; Department of Artificial Intelligence, School of Electronics Engineering, Kyungpook National University, Daegu, South Korea
Rammohan Mallipeddi: ORCiD; Department of Artificial Intelligence, School of Electronics Engineering, Kyungpook National University, Daegu, South Korea
Hwang Jae Joo: Gwang Myeong Tech Company Ltd., Daegu, South Korea

DOI: https://doi.org/10.1109/ACCESS.2024.3454221
Journal volume & issue: Vol. 12
pp. 123731 – 123738

Abstract

Read online

The advancement in computational capabilities and the availability of vast datasets have propelled the performance of Automatic Speech Recognition (ASR) systems. However, the task of ASR is complex, requiring consideration of diverse factors such as spoken tone, intonation, accents, and pitch modulation. To tackle these challenges, ensembles of Large Language Models (LLMs) have emerged as a promising approach, harnessing the strengths of multiple models to improve recognition accuracy. These ensembles, employing various strategies, often encounter significant time requirements during the inference process limiting the applicability in real-life scenarios. In this study, we introduce a novel ensemble strategy, the Priority-Encoder Ensemble (PE-Ensemble), for ASR systems. The PE-Ensemble employs a meta-learning-based Decider model to dynamically select the optimal model from the ensemble for inference, significantly reducing the computational load and memory requirements during inference. Unlike traditional ensembles where all models are loaded into memory, our approach requires only a single model to be loaded, enhancing efficiency in real-world applications such as unmanned kiosks. We evaluate the PE-Ensemble against the commonly used average ensemble strategy and individual base models. The results demonstrate that the PE-Ensemble outperforms both the average ensemble and individual base models in terms of prediction accuracy as well as computational time during inference. This enhancement in accuracy, coupled with the substantial reduction in computational load, highlights the efficacy and practical applicability of the proposed PE-Ensemble approach.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords