Mi-Go: tool which uses YouTube as data source for evaluating general-purpose speech recognition machine learning models

Tomasz Wojnar; Jarosław Hryszko; Adam Roman

doi:10.1186/s13636-024-00343-9

EURASIP Journal on Audio, Speech, and Music Processing (May 2024)

Mi-Go: tool which uses YouTube as data source for evaluating general-purpose speech recognition machine learning models

Tomasz Wojnar,
Jarosław Hryszko,
Adam Roman

Affiliations

Tomasz Wojnar: Jagiellonian University, Faculty of Mathematics and Computer Science, Division of Software Engineering
Jarosław Hryszko: Jagiellonian University, Faculty of Mathematics and Computer Science, Division of Software Engineering
Adam Roman: Jagiellonian University, Faculty of Mathematics and Computer Science, Division of Software Engineering

DOI: https://doi.org/10.1186/s13636-024-00343-9
Journal volume & issue: Vol. 2024, no. 1
pp. 1 – 17

Abstract

Read online

Abstract This article introduces Mi-Go, a tool aimed at evaluating the performance and adaptability of general-purpose speech recognition machine learning models across diverse real-world scenarios. The tool leverages YouTube as a rich and continuously updated data source, accounting for multiple languages, accents, dialects, speaking styles, and audio quality levels. To demonstrate the effectiveness of the tool, an experiment was conducted, by using Mi-Go to evaluate state-of-the-art automatic speech recognition machine learning models. The evaluation involved a total of 141 randomly selected YouTube videos. The results underscore the utility of YouTube as a valuable data source for evaluation of speech recognition models, ensuring their robustness, accuracy, and adaptability to diverse languages and acoustic conditions. Additionally, by contrasting the machine-generated transcriptions against human-made subtitles, the Mi-Go tool can help pinpoint potential misuse of YouTube subtitles, like search engine optimization.

Published in EURASIP Journal on Audio, Speech, and Music Processing

ISSN: 1687-4722 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Science: Physics: Acoustics. Sound; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://asmp-eurasipjournals.springeropen.com

About the journal