arXiv

Drift-Augmented Scoring: Text-Derived Noise Robustness for Zero-Shot Audio-Language Classification

Title: Enhancing Zero-Shot Audio-Language Classification with Text-Derived Noise Robustness via Drift-Augmented Scoring

Abstract:

Contrastive audio-language models, exemplified by CLAP, facilitate zero-shot audio classification by assigning labels to sounds based on the similarity between audio embeddings and text prompt embeddings, eliminating the need for labeled audio data. However, this matching mechanism is highly susceptible to acoustic noise, leading to significant performance drops. On standard benchmarks, accuracy and mean Average Precision (mAP) decrease by 12 to 30 percentage points when the signal-to-noise ratio (SNR) reaches 0 dB.

To address this vulnerability, we introduce Drift Augmented Scoring (DAS). This method incorporates a minor per-class bonus into the cosine similarity score. The bonus is awarded when the embedding of noisy audio shifts in the direction anticipated by the class’s noise-conditioned text prompts. Because this bonus is derived exclusively from text data, it can be calculated once and cached, requiring only a single inner product computation per class during inference. Notably, DAS operates without gradients or the need for test-time batching.

We evaluated DAS using a LAION CLAP backbone, comparing it against four variants of the concurrent method proposed by Acevedo et al. The evaluation utilized the UrbanSound8K dataset and the complete FSD50K evaluation set, introducing urban acoustic scene noise to audio clips across various SNR levels. DAS demonstrated consistent improvements across all test conditions, boosting accuracy by 2.60 to 5.75 points on UrbanSound8K and increasing mAP by 1.50 to 1.74 points on FSD50K.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Exelon CEO Sees Daily Cybersecurity Threats
Bloomberg

Exelon CEO Sees Daily Cybersecurity Threats

Exelon’s CEO warns of daily cybersecurity threats, highlighting persistent risks to the energy giant.

TechCrunch

Ramp raises $750M at $44B valuation as investors hunger for fintechs with an AI story

Ramp secured $750M at a $44B valuation, driven by AI integration and $1.5B+ revenue. The fintech firm now serves 70,000 ...

TechCrunch

Is Silicon Valley ready to put robots in people’s homes? Hello Robot is.

Hello Robot’s Stretch avoids Silicon Valley hype, focusing on practical home deployment to gather essential real-world d...

Canada to Provide Funding, Buy Equity Stakes in AI Startups
Bloomberg

Canada to Provide Funding, Buy Equity Stakes in AI Startups

Canada will fund and buy equity stakes in AI startups to boost the sector. This investment aims to strengthen the nation...

TechCrunch

Chinese spies are using LinkedIn to lure Westerners into sharing sensitive information

A joint Western security alert warns that Chinese spies use LinkedIn to impersonate recruiters and extract sensitive dat...

Peter Thiel’s Family Office Pays Record Rent for Top Miami Tower
Bloomberg

Peter Thiel’s Family Office Pays Record Rent for Top Miami Tower

Peter Thiel’s family office set a record rent for a Miami tower lease. This deal establishes a new benchmark for the cit...