Head‐related transfer function–reserved time‐frequency masking for robust binaural sound source localization

Hong Liu; Peipei Yuan; Bing Yang; Ge Yang; Yang Chen

doi:10.1049/cit2.12010

CAAI Transactions on Intelligence Technology (Mar 2022)

Head‐related transfer function–reserved time‐frequency masking for robust binaural sound source localization

Hong Liu,
Peipei Yuan,
Bing Yang,
Ge Yang,
Yang Chen

Affiliations

Hong Liu: Key Laboratory of Machine Perception Shenzhen Graduate School Peking University Shenzhen China
Peipei Yuan: Key Laboratory of Machine Perception Shenzhen Graduate School Peking University Shenzhen China
Bing Yang: Key Laboratory of Machine Perception Shenzhen Graduate School Peking University Shenzhen China
Ge Yang: School of Artificial Intelligence Chongqing University of Technology Chongqing China
Yang Chen: Yanka Kupala State University of Grodno Grodno Belarus

DOI: https://doi.org/10.1049/cit2.12010
Journal volume & issue: Vol. 7, no. 1
pp. 26 – 33

Abstract

Read online

Abstract Various time‐frequency (T‐F) masks are being applied to sound source localization tasks. Moreover, deep learning has dramatically advanced T‐F mask estimation. However, existing masks are usually designed for speech separation tasks and are suitable only for single‐channel signals. A novel complex‐valued T‐F mask is proposed that reserves the head‐related transfer function (HRTF), customized for binaural sound source localization. In addition, because the convolutional neural network that is exploited to estimate the proposed mask takes binaural spectral information as the input and output, accurate binaural cues can be preserved. Compared with conventional T‐F masks that emphasize single speech source–dominated T‐F units, HRTF‐reserved masks eliminate the speech component while keeping the direct propagation path. Thus, the estimated HRTF is capable of extracting more reliable localization features for the final direction of arrival estimation. Hence, binaural sound source localization guided by the proposed T‐F mask is robust under noisy and reverberant acoustic environments. The experimental results demonstrate that the new T‐F mask is superior to conventional T‐F masks and lead to the better performance of sound source localization in adverse environments.

Published in CAAI Transactions on Intelligence Technology

ISSN: 2468-2322 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/24682322

About the journal

Abstract

Keywords