A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction

Simon Schwär; Michael Krause; Michael Fast; Sebastian Rosenzweig; Frank Scherbaum; Meinard Müller

doi:10.5334/tismir.166

Transactions of the International Society for Music Information Retrieval (Feb 2024)

A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction

Simon Schwär,
Michael Krause,
Michael Fast,
Sebastian Rosenzweig,
Frank Scherbaum,
Meinard Müller

Affiliations

Simon Schwär: ORCiD; International Audio Laboratories Erlangen
Michael Krause: ORCiD; International Audio Laboratories Erlangen
Michael Fast: ORCiD; International Audio Laboratories Erlangen
Sebastian Rosenzweig: ORCiD; International Audio Laboratories Erlangen, DE; Audoo Ltd., London
Frank Scherbaum: ORCiD; University of Potsdam
Meinard Müller: ORCiD; International Audio Laboratories Erlangen

DOI: https://doi.org/10.5334/tismir.166
Journal volume & issue: Vol. 7, no. 1
pp. 30–43 – 30–43

Abstract

Read online

Larynx microphones (LMs) make it possible to obtain practically crosstalk-free recordings of the human voice by picking up vibrations directly from the throat. This can be useful in a multitude of music information retrieval scenarios related to singing, e.g., the analysis of individual voices recorded in environments with lots of interfering noise. However, LMs have a limited frequency range and barely capture the effects of the vocal tract, which makes the recorded signal unsuitable for downstream tasks that require high-quality recordings. In this paper, we introduce the task of reconstructing a natural sounding, high-quality singing voice recording from an LM signal. With an explicit focus on the singing voice, the problem lies at the intersection of speech enhancement and singing voice synthesis with the additional requirement of faithful reproduction of expressive parameters like intonation. In this context, we make three main contributions. First, we publish a dataset with over 4 hours of popular music we recorded with four amateur singers accompanied by a guitar, where both LM and clean close-up microphone signals are available. Second, we propose a data-driven baseline approach for singing voice reconstruction from LM signals using differentiable signal processing, inspired by a source-filter model that emulates the missing vocal tract effects. Third, we evaluate the baseline with a listening test and further show that it can improve the accuracy of lyrics transcription as an exemplary downstream task.

Published in Transactions of the International Society for Music Information Retrieval

ISSN: 2514-3298 (Online)
Publisher: Ubiquity Press
Country of publisher: United Kingdom
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Music and books on Music: Music
Website: https://transactions.ismir.net/

About the journal

Abstract

Keywords