Echo: A crowd-sourced Romanian speech dataset.

Remus-Dan Ungureanu; Mihai Dascalu

doi:10.55612/s-5002-062-009

Interaction Design and Architecture(s) (Nov 2024)

Echo: A crowd-sourced Romanian speech dataset.

Remus-Dan Ungureanu,
Mihai Dascalu

Affiliations

Remus-Dan Ungureanu
Mihai Dascalu

DOI: https://doi.org/10.55612/s-5002-062-009
Journal volume & issue: no. 62
pp. 141 – 152

Abstract

Read online

Romanian is the seventh most popular European language, with around 30 million speakers worldwide. Despite its popularity, the available speech resources are limited. As a result, there are few models that transcribe Romanian well, most of them being multilingual models that also cover less popular languages. Echo is a crowd-sourcing platform that has collected more than 300 hours of speech from various contributors. In this study, we document how a large speech dataset enables researchers to train automatic speech recognition, speaker verification, and diarization models to automatically process students’ notes. We publicly release both the dataset and the Whisper-based baseline model as open-source.

Published in Interaction Design and Architecture(s)

ISSN: 1826-9745 (Print); 2283-2998 (Online)
Publisher: ASLERD
Country of publisher: Italy
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://ixdea.org/

About the journal