Global News Digest

arXiv

VocSim: A Training-free Benchmark for Zero-shot Content Identity in Single-source Audio

Title: VocSim: A Training-free Benchmark for Zero-shot Content Identity in Single-source Audio

Abstract:

General-purpose audio representations are designed to map acoustically variable instances of identical events to proximate points, thereby resolving content identity within a zero-shot framework. In contrast to supervised classification benchmarks that assess adaptability through parameter updates, we present VocSim, a training-free benchmark that examines the intrinsic geometric alignment of frozen embeddings. This approach operates without parameter updates or labeled data, utilizing only a label-free PCA whitening step fitted per subset to correct for anisotropy.

VocSim consolidates 125,000 single-source audio clips drawn from 19 distinct corpora, covering human speech, animal vocalizations, and environmental sounds. The benchmark explicitly isolates content representation from source separation tasks, excluding polyphonic mixtures from its scope. We assess embedding quality using Precision@k to measure local purity and the Global Separation Rate (GSR) to evaluate point-wise class separation, with GSR values calibrated against an empirical permutation baseline to determine lift.

A straightforward pipeline comprising frozen Whisper features, time-frequency pooling, and label-free PCA demonstrates robust zero-shot performance, maintaining stable GSR rankings across various domains (Kendall's tau = 0.60). However, performance on blind, low-resource speech datasets (specifically Shipibo-Conibo and Chintang) reveals a collapse in local retrieval capabilities, though results remain above chance levels, highlighting a cross-lingual speech generalization gap. As external validation, our top-performing embeddings accurately predict avian perceptual similarity, enhance bioacoustic classification, and achieve state-of-the-art results on the HEAR benchmark. We publicly release the associated data, code, and leaderboard.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.