IEEE Access (Jan 2023)
PRTFNet: HRTF Individualization for Accurate Spectral Cues Using a Compact PRTF
Abstract
Spatial audio rendering relies on accurate localization perception, which requires individual head-related transfer functions (HRTFs). Previous methods based on deep neural networks (DNNs) for predicting HRTF magnitude spectra from pinna images used HRTF log-magnitude as the network output during the training stage. However, HRTFs encompass the acoustical characteristics of the head and torso, making it challenging to reconstruct the spectral cues necessary for elevation localization. To tackle this issue, we propose PRTFNet to reconstruct the individual spectral cues in HRTFs by mitigating the influence of the head and torso. PRTFNet consists of an end-to-end convolutional neural network (CNN) model and leverages a compact pinna-related transfer function (PRTF) that eliminates the impact of sound reflections from the head and torso in the head-related impulse response (HRIR) as network output. Additionally, we introduce HRTF phase personalization, a technique that utilizes the phase spectra of a selected HRTFs from a database and adjusts the phase by multiplying it by the ratio of the target listener’s head width to that of the subject of the selected HRTFs. We evaluated the proposed HRTF individualization methods using the HUTUBS dataset, and the results demonstrate that PRTFNet is highly effective in reconstructing the first and second spectral cues. In terms of log spectral distortion (LSD) and effective LSD (LSDE), PRTFNet outperforms previous deep learning-based model. Furthermore, multiplying the selected phase by the head width ratio reduces the root mean square error (RMSE) of interaural time difference (ITD) by 0.003 ms.
Keywords