Evaluation of the effectiveness and efficiency of state-of-the-art features and models for automatic speech recognition error detection

Asmaa El Hannani; Rahhal Errattahi; Fatima Zahra Salmam; Thomas Hain; Hassan Ouahmane

doi:10.1186/s40537-020-00391-w

Journal of Big Data (Jan 2021)

Evaluation of the effectiveness and efficiency of state-of-the-art features and models for automatic speech recognition error detection

Asmaa El Hannani,
Rahhal Errattahi,
Fatima Zahra Salmam,
Thomas Hain,
Hassan Ouahmane

Affiliations

Asmaa El Hannani: Laboratory of Information Technologies, National School of Applied Sciences, University of Chouaib Doukkali
Rahhal Errattahi: Laboratory of Information Technologies, National School of Applied Sciences, University of Chouaib Doukkali
Fatima Zahra Salmam: LAROSERI Laboratory, University of Chouaib Doukkali
Thomas Hain: Speech and Hearing Group, Department of Computer Science, University of Sheffield
Hassan Ouahmane: Laboratory of Information Technologies, National School of Applied Sciences, University of Chouaib Doukkali

DOI: https://doi.org/10.1186/s40537-020-00391-w
Journal volume & issue: Vol. 8, no. 1
pp. 1 – 16

Abstract

Read online

Abstract Speech based human-machine interaction and natural language understanding applications have seen a rapid development and wide adoption over the last few decades. This has led to a proliferation of studies that investigate Error detection and classification in Automatic Speech Recognition (ASR) systems. However, different data sets and evaluation protocols are used, making direct comparisons of the proposed approaches (e.g. features and models) difficult. In this paper we perform an extensive evaluation of the effectiveness and efficiency of state-of-the-art approaches in a unified framework for both errors detection and errors type classification. We make three primary contributions throughout this paper: (1) we have compared our Variant Recurrent Neural Network (V-RNN) model with three other state-of-the-art neural based models, and have shown that the V-RNN model is the most effective classifier for ASR error detection in term of accuracy and speed, (2) we have compared four features’ settings, corresponding to different categories of predictor features and have shown that the generic features are particularly suitable for real-time ASR error detection applications, and (3) we have looked at the post generalization ability of our error detection framework and performed a detailed post detection analysis in order to perceive the recognition errors that are difficult to detect.

Published in Journal of Big Data

ISSN: 2196-1115 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journalofbigdata.springeropen.com

About the journal

Abstract

Keywords