Systematic Evaluation of Time-Frequency Features for Binaural Sound Source Localization
Title: A Comprehensive Assessment of Time-Frequency Representations for Binaural Sound Source Localization
Abstract: This research conducts a thorough examination of time-frequency feature engineering within the context of binaural sound source localization (SSL), specifically analyzing how the choice of features impacts model efficacy under varying conditions. We assess the capabilities of a convolutional neural network (CNN) architecture when trained on different configurations of amplitude-related attributes—namely the magnitude spectrogram and interaural level difference (ILD)—alongside phase-related attributes, including the phase spectrogram and interaural phase difference (IPD). Our tests, which utilized both in-domain and out-of-domain datasets featuring mismatched head-related transfer functions (HRTFs), demonstrate that thoughtfully selected feature combinations frequently yield superior results compared to simply increasing model complexity. Although dual-feature configurations like ILD combined with IPD are adequate for SSL tasks within the training domain, achieving robust generalization across varied content necessitates more comprehensive inputs that integrate channel spectrograms with both ILD and IPD data. By leveraging these optimal feature sets, our streamlined CNN model delivers performance comparable to more complex alternatives. These results highlight the critical role of feature design in binaural SSL and offer actionable insights for developing solutions tailored to specific domains as well as those intended for general use.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





