Electronics Letters (Dec 2022)

Speaker front‐back disambiguity using multi‐channel speech signals

  • Xinyuan Qian,
  • Jichen Yang,
  • Alessio Brutti

DOI
https://doi.org/10.1049/ell2.12666
Journal volume & issue
Vol. 58, no. 25
pp. 1012 – 1015

Abstract

Read online

Abstract This paper tackles the front‐back disambiguity problem in speaker localization when the audio signals are captured by a symmetric microphone array. To this end, a deep neural network is proposed with an attention‐based mechanism designed to assign different weights to features obtained from individual microphones. For support, a real dataset with synchronized multichannel audio signals captured by a large linear microphone array is introduced, along with manual annotations. The experimental results demonstrate the effectiveness of the proposed method over the other approaches. In particular, more than 50% reduction in Equal Error Rate (EER) is achieved when comparing with the single‐channel case. The designed multi‐channel self‐attention mechanism also brings further improvements. The dataset and source code will be released.