IEEE Access (Jan 2020)

Speech Segregation in Background Noise Based on Deep Learning

  • Joseph Bamidele Awotunde,
  • Roseline Oluwaseun Ogundokun,
  • Femi Emmanuel Ayo,
  • Opeyemi Emmanuel Matiluko

DOI
https://doi.org/10.1109/ACCESS.2020.3024077
Journal volume & issue
Vol. 8
pp. 169568 – 169575

Abstract

Read online

The most important way several people communicate is through speech. Speech is used to convey other information such as speaker communication, emotion, and attitude. Therefore, it is the most convenient and natural means of communication. The concept of speech segregation or processing involves sorting out wanted speech from noises in the background. Recently, a supervised learning approach was formulated for speech segregation problems. The latest trend in speech processing comprises the utilization of deep learning systems to increase the computational speed and performance of speech processing tasks. Hence, this study employed the use of a convolutional neural network to segregate speech in background noise. The convolutional neural network was used to explain the features of presenter auditory and consecutive subtleties. An unadapted speaker model was originally utilized to separate the two vocalizations gestures; they were then applied to the assessed signal-to-noise ratio (SNR) participation. The participation of SNR was thereafter applied to modify the speaker prototypes for re-estimating the speech signals that iterated twice before convergence. The developed method was tested on the TIMIT dataset. The results showed the strength of the developed method for speech segregation in background noise. Also, the findings of the study suggested that the method enhanced isolation performance and congregated reasonably fast. It was deduced that the system is simple and performs better in comparison to ultramodern speech processing methods in some input SNR conditions.

Keywords