Deep neural networks for automatic speech processing: a survey from large corpora to limited data

Vincent Roger; Jérôme Farinas; Julien Pinquier

doi:10.1186/s13636-022-00251-w

EURASIP Journal on Audio, Speech, and Music Processing (Aug 2022)

Deep neural networks for automatic speech processing: a survey from large corpora to limited data

Vincent Roger,
Jérôme Farinas,
Julien Pinquier

Affiliations

Vincent Roger: IRIT, Université de Toulouse, CNRS
Jérôme Farinas: IRIT, Université de Toulouse, CNRS
Julien Pinquier: IRIT, Université de Toulouse, CNRS

DOI: https://doi.org/10.1186/s13636-022-00251-w
Journal volume & issue: Vol. 2022, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Most state-of-the-art speech systems use deep neural networks (DNNs). These systems require a large amount of data to be learned. Hence, training state-of-the-art frameworks on under-resourced speech challenges are difficult tasks. As an example, a challenge could be the limited amount of data to model impaired speech. Furthermore, acquiring more data and/or expertise is time-consuming and expensive. In this paper, we focus on the following speech processing tasks: automatic speech recognition, speaker identification, and emotion recognition. To assess the problem of limited data, we firstly investigate state-of-the-art automatic speech recognition systems, as this is the hardest task (due to the wide variability in each language). Next, we provide an overview of techniques and tasks requiring fewer data. In the last section, we investigate few-shot techniques by interpreting under-resourced speech as a few-shot problem. In that sense, we propose an overview of few-shot techniques and the possibility of using such techniques for the speech problems addressed in this survey. It is true that the reviewed techniques are not well adapted for large datasets. Nevertheless, some promising results from the literature encourage the usage of such techniques for speech processing.

Published in EURASIP Journal on Audio, Speech, and Music Processing

ISSN: 1687-4722 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Science: Physics: Acoustics. Sound; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://asmp-eurasipjournals.springeropen.com

About the journal

Abstract

Keywords