Modelling individual head‐related transfer function (HRTF) based on anthropometric parameters and generic HRTF amplitudes

Rui Zhang; Ruijie Meng; Jinqiu Sang; Yi Hu; Xiaodong Li; Chengshi Zheng

doi:10.1049/cit2.12196

CAAI Transactions on Intelligence Technology (Jun 2023)

Modelling individual head‐related transfer function (HRTF) based on anthropometric parameters and generic HRTF amplitudes

Rui Zhang,
Ruijie Meng,
Jinqiu Sang,
Yi Hu,
Xiaodong Li,
Chengshi Zheng

Affiliations

Rui Zhang: Key Laboratory of Noise and Vibration Research Institute of Acoustics Chinese Academy of Sciences Beijing China
Ruijie Meng: Key Laboratory of Noise and Vibration Research Institute of Acoustics Chinese Academy of Sciences Beijing China
Jinqiu Sang: Shanghai Institute of AI for Education East China Normal University Shanghai China
Yi Hu: Department of Electrical Engineering and Computer Science University of Wisconsin–Milwaukee Milwaukee Wisconsin USA
Xiaodong Li: Key Laboratory of Noise and Vibration Research Institute of Acoustics Chinese Academy of Sciences Beijing China
Chengshi Zheng: Key Laboratory of Noise and Vibration Research Institute of Acoustics Chinese Academy of Sciences Beijing China

DOI: https://doi.org/10.1049/cit2.12196
Journal volume & issue: Vol. 8, no. 2
pp. 364 – 378

Abstract

Read online

Abstract The head‐related transfer function (HRTF) plays a vital role in immersive virtual reality and augmented reality technologies, especially in spatial audio synthesis for binaural reproduction. This article proposes a deep learning method with generic HRTF amplitudes and anthropometric parameters as input features for individual HRTF generation. By designing fully convolutional neural networks, the key anthropometric parameters and the generic HRTF amplitudes were used to predict each individual HRTF amplitude spectrum in the full‐space directions, and the interaural time delay (ITD) was predicted by the transformer module. In the amplitude prediction model, the attention mechanism was adopted to better capture the relationship of HRTF amplitude spectra at two distinctive directions with large angle differences in space. Finally, with the minimum phase model, the predicted amplitude spectrum and ITDs were used to obtain a set of individual head‐related impulse responses. Besides the separate training of the HRTF amplitude and ITD generation models, their joint training was also considered and evaluated. The root‐mean‐square error and the log‐spectral distortion were selected as objective measurement metrics to evaluate the performance. Subjective experiments further showed that the auditory source localisation performance of the proposed method was better than other methods in most cases.

Published in CAAI Transactions on Intelligence Technology

ISSN: 2468-2322 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/24682322

About the journal

Abstract

Keywords