Al-Khawarizmi Engineering Journal (Dec 2023)

An Overview of Audio-Visual Source Separation Using Deep Learning

  • Noorulhuda Mudhafar Sulaiman,
  • Ahmed Al Tmeme,
  • Mohammed Najah Mahdi

DOI
https://doi.org/10.22153/kej.2023.06.003
Journal volume & issue
Vol. 19, no. 4

Abstract

Read online

In this article, the research presents a general overview of deep learning-based AVSS (audio-visual source separation) systems. AVSS has achieved exceptional results in a number of areas, including decreasing noise levels, boosting speech recognition, and improving audio quality. The advantages and disadvantages of each deep learning model are discussed throughout the research as it reviews various current experiments on AVSS. The TCD TIMIT dataset (which contains top-notch audio and video recordings created especially for speech recognition tasks) and the Voxceleb dataset (a sizable collection of brief audio-visual clips with human speech) are just a couple of the useful datasets summarized in the paper that can be used to test AVSS systems. In its basic form, this review aims to highlight the growing importance of AVSS in improving the quality of audio signals.