Mechanical Engineering Journal (Nov 2019)

Blind source separation by multilayer neural network classifiers for spectrogram analysis

  • Toshihiko SHIRAISHI,
  • Tomoki DOURA

DOI
https://doi.org/10.1299/mej.18-00527
Journal volume & issue
Vol. 6, no. 6
pp. 18-00527 – 18-00527

Abstract

Read online

This paper describes a novel method for blind source separation using multilayer neural networks when an audio signal has been recorded in a room with reverberation or with moving signal sources. In conventional applications, speech-recognition specialists can identify the signal from a specific speaker in a recording of many speakers by analyzing a spectrogram of the recording. The spectrogram is a visual representation of the time series of frequency spectra of a target signal. To use multilayer neural networks for a similar classification task, the proposed method begins by preparing a spectrogram of a mixed signal using the short-time Fourier transform, which is then regarded as a visual object. The spectrogram is then divided into small time-frequency segments and each segment is classified into a class of the corresponding signal source by the multilayer neural networks. After that, an inverse short-time Fourier transform is employed to extract the separated signals. The paper also evaluates the separation performance of this classification algorithm. With the transformation of the blind source separation problem into a classification problem, multilayer neural network classifiers can be used, and they do not require information about the mixing environment, or statistical characteristics of the target signals, or multiple microphones. Simulated tests indicate that the proposed method achieves good separation performance under conditions with reverberation or moving signal sources. The proposed method may be adapted for separating signals from unknown convolutive mixtures and time-varying systems.

Keywords