PRTFNet: HRTF Individualization for Accurate Spectral Cues Using a Compact PRTF

Byeong-Yun Ko; Gyeong-Tae Lee; Hyeonuk Nam; Yong-Hwa Park

doi:10.1109/ACCESS.2023.3308143

IEEE Access (Jan 2023)

PRTFNet: HRTF Individualization for Accurate Spectral Cues Using a Compact PRTF

Byeong-Yun Ko,
Gyeong-Tae Lee,
Hyeonuk Nam,
Yong-Hwa Park

Affiliations

Byeong-Yun Ko: ORCiD; Korea Advanced Institute of Science and Technology, Daejeon, South Korea
Gyeong-Tae Lee: ORCiD; Korea Advanced Institute of Science and Technology, Daejeon, South Korea
Hyeonuk Nam: ORCiD; Korea Advanced Institute of Science and Technology, Daejeon, South Korea
Yong-Hwa Park: ORCiD; Korea Advanced Institute of Science and Technology, Daejeon, South Korea

DOI: https://doi.org/10.1109/ACCESS.2023.3308143
Journal volume & issue: Vol. 11
pp. 96119 – 96130

Abstract

Read online

Spatial audio rendering relies on accurate localization perception, which requires individual head-related transfer functions (HRTFs). Previous methods based on deep neural networks (DNNs) for predicting HRTF magnitude spectra from pinna images used HRTF log-magnitude as the network output during the training stage. However, HRTFs encompass the acoustical characteristics of the head and torso, making it challenging to reconstruct the spectral cues necessary for elevation localization. To tackle this issue, we propose PRTFNet to reconstruct the individual spectral cues in HRTFs by mitigating the influence of the head and torso. PRTFNet consists of an end-to-end convolutional neural network (CNN) model and leverages a compact pinna-related transfer function (PRTF) that eliminates the impact of sound reflections from the head and torso in the head-related impulse response (HRIR) as network output. Additionally, we introduce HRTF phase personalization, a technique that utilizes the phase spectra of a selected HRTFs from a database and adjusts the phase by multiplying it by the ratio of the target listener’s head width to that of the subject of the selected HRTFs. We evaluated the proposed HRTF individualization methods using the HUTUBS dataset, and the results demonstrate that PRTFNet is highly effective in reconstructing the first and second spectral cues. In terms of log spectral distortion (LSD) and effective LSD (LSDE), PRTFNet outperforms previous deep learning-based model. Furthermore, multiplying the selected phase by the head width ratio reduces the root mean square error (RMSE) of interaural time difference (ITD) by 0.003 ms.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords