Causal Diffusion Models for Generalized Speech Enhancement

Julius Richter; Simon Welker; Jean-Marie Lemercier; Bunlong Lay; Tal Peer; Timo Gerkmann

doi:10.1109/OJSP.2024.3379070

IEEE Open Journal of Signal Processing (Jan 2024)

Causal Diffusion Models for Generalized Speech Enhancement

Julius Richter,
Simon Welker,
Jean-Marie Lemercier,
Bunlong Lay,
Tal Peer,
Timo Gerkmann

Affiliations

Julius Richter: ORCiD; Signal Processing (SP), Universität Hamburg, Hamburg, Germany
Simon Welker: ORCiD; Signal Processing (SP), Universität Hamburg, Hamburg, Germany
Jean-Marie Lemercier: ORCiD; Signal Processing (SP), Universität Hamburg, Hamburg, Germany
Bunlong Lay: ORCiD; Signal Processing (SP), Universität Hamburg, Hamburg, Germany
Tal Peer: ORCiD; Signal Processing (SP), Universität Hamburg, Hamburg, Germany
Timo Gerkmann: ORCiD; Signal Processing (SP), Universität Hamburg, Hamburg, Germany

DOI: https://doi.org/10.1109/OJSP.2024.3379070
Journal volume & issue: Vol. 5
pp. 780 – 789

Abstract

Read online

In this work, we present a causal speech enhancement system that is designed to handle different types of corruptions. This paper is an extended version of our contribution to the “ICASSP 2023 Speech Signal Improvement Challenge”. The method is based on a generative diffusion model which has been shown to work well in scenarios beyond speech-in-noise, such as missing data and non-additive corruptions. We guarantee causal processing with an algorithmic latency of 20 ms by modifying the network architecture and removing non-causal normalization techniques. To train and test our model, we generate a new corrupted speech dataset which includes additive background noise, reverberation, clipping, packet loss, bandwidth reduction, and codec artifacts. We compare the causal and non-causal versions of our method to investigate the impact of causal processing and we assess the gap between specialized models trained on a particular corruption type and the generalized model trained on all corruptions. Although specialized models and non-causal models have a small advantage, we show that the generalized causal approach does not suffer from a significant performance penalty, while it can be flexibly employed for real-world applications where different types of distortions may occur.

Published in IEEE Open Journal of Signal Processing

ISSN: 2644-1322 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=8782710

About the journal

Abstract

Keywords