IEEE Open Journal of Signal Processing (Jan 2024)

Attention-Based End-to-End Differentiable Particle Filter for Audio Speaker Tracking

  • Jinzheng Zhao,
  • Yong Xu,
  • Xinyuan Qian,
  • Haohe Liu,
  • Mark D. Plumbley,
  • Wenwu Wang

DOI
https://doi.org/10.1109/OJSP.2024.3363649
Journal volume & issue
Vol. 5
pp. 449 – 458

Abstract

Read online

Particle filters (PFs) have been widely used in speaker tracking due to their capability in modeling a non-linear process or a non-Gaussian environment. However, particle filters are limited by several issues. For example, pre-defined handcrafted measurements are often used which can limit the model performance. In addition, the transition and update models are often preset which make PF less flexible to be adapted to different scenarios. To address these issues, we propose an end-to-end differentiable particle filter framework by employing the multi-head attention to model the long-range dependencies. The proposed model employs the self-attention as the learned transition model and the cross-attention as the learned update model. To our knowledge, this is the first proposal of combining particle filter and transformer for speaker tracking, where the measurement extraction, transition and update steps are integrated into an end-to-end architecture. Experimental results show that the proposed model achieves superior performance over the recurrent baseline models.

Keywords