Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter

Yidi Li; Hong Liu; Bing Yang; Runwei Ding; Yang Chen

doi:10.1155/2020/3764309

Complexity (Jan 2020)

Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter

Yidi Li,
Hong Liu,
Bing Yang,
Runwei Ding,
Yang Chen

Affiliations

Yidi Li: Key Laboratory of Machine Perception, Shenzhen Graduate School, Peking University, Shenzhen 518055, China
Hong Liu: Key Laboratory of Machine Perception, Shenzhen Graduate School, Peking University, Shenzhen 518055, China
Bing Yang: Key Laboratory of Machine Perception, Shenzhen Graduate School, Peking University, Shenzhen 518055, China
Runwei Ding: School of Artificial Intelligence, Chongqing University of Technology, Chongqing 401135, China
Yang Chen: Yanka Kupala State University of Grodno, Grodno, Belarus

DOI: https://doi.org/10.1155/2020/3764309
Journal volume & issue: Vol. 2020

Abstract

Read online

For speaker tracking, integrating multimodal information from audio and video provides an effective and promising solution. The current challenges are focused on the construction of a stable observation model. To this end, we propose a 3D audio-visual speaker tracker assisted by deep metric learning on the two-layer particle filter framework. Firstly, the audio-guided motion model is applied to generate candidate samples in the hierarchical structure consisting of an audio layer and a visual layer. Then, a stable observation model is proposed with a designed Siamese network, which provides the similarity-based likelihood to calculate particle weights. The speaker position is estimated using an optimal particle set, which integrates the decisions from audio particles and visual particles. Finally, the long short-term mechanism-based template update strategy is adopted to prevent drift during tracking. Experimental results demonstrate that the proposed method outperforms the single-modal trackers and comparison methods. Efficient and robust tracking is achieved both in 3D space and on image plane.

Published in Complexity

ISSN: 1076-2787 (Print); 1099-0526 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://onlinelibrary.wiley.com/journal/8503

About the journal