HRTFformer: A Spatially-Aware Transformer for Individual HRTF Upsampling in Immersive Audio Rendering
Title: HRTFformer: A Spatially-Aware Transformer for Individual HRTF Upsampling in Immersive Audio Rendering
Abstract:
As commercial immersive audio platforms increasingly adopt individual Head-Related Transfer Functions (HRTFs), their role in delivering realistic spatial audio rendering has become indispensable. However, the widespread deployment of individual HRTFs faces a significant barrier: the measurement process is too complex and resource-intensive to scale effectively. To address this limitation, HRTF spatial upsampling has emerged as a strategy to minimize the number of required measurements. Although previous machine learning (ML) initiatives have achieved some success, they frequently encounter difficulties in maintaining local spatial variation patterns across adjacent source directions over long ranges and in generalizing effectively at high upsampling rates.
In this study, we introduce HRTFformer, a novel transformer-based architecture designed for HRTF upsampling. By utilizing an attention mechanism, the model effectively captures spatial correlations throughout the HRTF sphere. Operating within the spherical harmonic (SH) domain, our approach reconstructs high-resolution HRTFs from sparse input data, achieving substantially higher accuracy. Furthermore, to bolster spatial coherence, we incorporate a neighbor dissimilarity loss that encourages magnitude smoothness, resulting in more natural upsampling outcomes. We assessed our method through both perceptual localization models and objective spectral distortion metrics. Our experimental results demonstrate that HRTFformer surpasses current state-of-the-art techniques across multiple evaluation criteria, producing high-fidelity HRTFs that are perceptually realistic.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




