Deep Learning Models for Single-Channel Speech Enhancement on Drones

Dmitrii Mukhutdinov; Ashish Alex; Andrea Cavallaro; Lin Wang

doi:10.1109/ACCESS.2023.3253719

IEEE Access (Jan 2023)

Deep Learning Models for Single-Channel Speech Enhancement on Drones

Dmitrii Mukhutdinov,
Ashish Alex,
Andrea Cavallaro,
Lin Wang

Affiliations

Dmitrii Mukhutdinov: Centre for Intelligent Sensing, Queen Mary University of London, London, U.K
Ashish Alex: Centre for Intelligent Sensing, Queen Mary University of London, London, U.K
Andrea Cavallaro: ORCiD; Centre for Intelligent Sensing, Queen Mary University of London, London, U.K
Lin Wang: ORCiD; Centre for Intelligent Sensing, Queen Mary University of London, London, U.K

DOI: https://doi.org/10.1109/ACCESS.2023.3253719
Journal volume & issue: Vol. 11
pp. 22993 – 23007

Abstract

Read online

Speech enhancement for drone audition is made challenging by the strong ego-noise from the rotating motors and propellers, which leads to extremely low signal-to-noise ratios (e.g. SNR $< -15$ dB) at onboard microphones. In this paper, we extensively assess the ability of single-channel deep learning approaches to ego-noise reduction on drones. We train twelve representative deep neural network (DNN) models, covering three operation domains (time-frequency magnitude domain, time-frequency complex domain and end-to-end time domain) and three distinct architectures (sequential, encoder-decoder and generative). We critically discuss and compare the performance of these models in extremely low-SNR scenarios, ranging from −30 to 0 dB. We show that time-frequency complex domain and UNet encoder-decoder architectures outperform other approaches on speech enhancement measures while providing a good trade-off with other criteria, such as model size, computation complexity and context length. The best-performing model is a UNet model operating in the time-frequency complex domain, which, at input SNR −15 dB, improves ESTOI from 0.1 to 0.4, PESQ from 1.0 to 1.9 and SI-SDR from −15 dB to 3.7 dB. Based on the insights drawn from these findings, we discuss future research in drone ego-noise reduction.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords