IEEE Access (Jan 2023)

DeepLabV3+ Vision Transformer for Visual Bird Sound Denoising

  • Junhui Li,
  • Pu Wang,
  • Youshan Zhang

DOI
https://doi.org/10.1109/ACCESS.2023.3294476
Journal volume & issue
Vol. 11
pp. 92540 – 92549

Abstract

Read online

Audio denoising is a task to improve the perceptual quality of noisy audio signals. There is still residual noise after the denoising of noisy signals, which will affect the quality of audio data. Traditional and deep learning-based methods are still limited to the manual addition of artificial noise or low-frequency noise. Recently, audio denoising has been transformed into an image segmentation problem, and deep neural networks have been applied to solve this problem. However, its performance is limited to shallow image segmentation models. This paper proposes a novel vision transformer model for visual bird sound denoising, combining a pyramid transformer and DeepLabV3+ network (named PtDeepLab) to filter out the noise. The proposed PtDeepLab model is based on the pyramid transformer, which generates long-range and multi-scale representations. The PtDeepLab model can achieve intuitive noise reduction in audio, which helps to separate clean audio from the mixture signal. Extensive experimental results showed that the proposed model has a better denoising performance than state-of-the-art methods.

Keywords