Reconsidering Read and Spontaneous Speech: Causal Perspectives on the Generation of Training Data for Automatic Speech Recognition

Philipp Gabler; Bernhard C. Geiger; Barbara Schuppler; Roman Kern

doi:10.3390/info14020137

Information (Feb 2023)

Reconsidering Read and Spontaneous Speech: Causal Perspectives on the Generation of Training Data for Automatic Speech Recognition

Philipp Gabler,
Bernhard C. Geiger,
Barbara Schuppler,
Roman Kern

Affiliations

Philipp Gabler: Area of Knowledge Discovery, Know-Center GmbH, 8010 Graz, Austria
Bernhard C. Geiger: Area of Knowledge Discovery, Know-Center GmbH, 8010 Graz, Austria
Barbara Schuppler: Signal Processing and Speech Communication Laboratory, Graz University of Technology, 8010 Graz, Austria
Roman Kern: Area of Knowledge Discovery, Know-Center GmbH, 8010 Graz, Austria

DOI: https://doi.org/10.3390/info14020137
Journal volume & issue: Vol. 14, no. 2
p. 137

Abstract

Read online

Superficially, read and spontaneous speech—the two main kinds of training data for automatic speech recognition—appear as complementary, but are equal: pairs of texts and acoustic signals. Yet, spontaneous speech is typically harder for recognition. This is usually explained by different kinds of variation and noise, but there is a more fundamental deviation at play: for read speech, the audio signal is produced by recitation of the given text, whereas in spontaneous speech, the text is transcribed from a given signal. In this review, we embrace this difference by presenting a first introduction of causal reasoning into automatic speech recognition, and describing causality as a tool to study speaking styles and training data. After breaking down the data generation processes of read and spontaneous speech and analysing the domain from a causal perspective, we highlight how data generation by annotation must affect the interpretation of inference and performance. Our work discusses how various results from the causality literature regarding the impact of the direction of data generation mechanisms on learning and prediction apply to speech data. Finally, we argue how a causal perspective can support the understanding of models in speech processing regarding their behaviour, capabilities, and limitations.

Published in Information

ISSN: 2078-2489 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/information/

About the journal

Abstract

Keywords