arXiv

Beyond Text Following: Repairable Arbitration Reversals in Audio-Language Models

Title: Repairable Arbitration Reversals in Audio-Language Models: Moving Beyond Text Following

Abstract:

Audio-language models (ALMs) frequently prioritize textual input over audio cues, even when the auditory evidence is unambiguous. This phenomenon prompts a fundamental inquiry: is the audio-supported response genuinely absent from the model’s internal representation, or does it exist but get suppressed by conflicting text? To investigate this, we employ a same-audio counterfactual approach, wherein the audio remains constant while the conflicting text is removed, allowing us to measure shifts in model preference.

Our analysis across five distinct ALMs and four conflict-based tasks reveals that 64.1% of conflicting samples exhibit a sign flip. Specifically, the same-audio branch favors the audio-supported answer, while the joint branch (with both modalities present) favors the text-supported answer. This trend indicates that audio evidence is indeed encoded but is outvoted during the arbitration process. Further investigation via activation patching pinpoints the reversal to the computation of answer-position scores, with patching effects showing a strong correlation (Spearman rho=0.93) with differences in output candidate scores.

Leveraging these insights, we introduce Gated Audio Counterfactual Logit Correction (GACL), a decoding mechanism that requires no additional training. GACL functions by interpolating between joint and same-audio scores. Evaluated under a strict budget allowing for only a 5 percentage-point drop in faithfulness, GACL enhances nAUC by 17.8 points compared to the top contrastive baseline. Furthermore, the method demonstrates strong transferability to vision-text arbitration tasks without retuning, achieving improvements of up to +40.5 percentage points.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion
Bloomberg

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion

Nvidia-backed robotics startup Generalist AI has reached a $2 billion valuation. Founders Pete Florence, Andy Zeng, and ...

TechCrunch

Oura Ring 5 review: Thinner, lighter, better

The Oura Ring 5 is 40% smaller and lighter than its predecessor, offering superior comfort and a discreet, jewelry-like ...

Financial Times

How AI has de-skilled translation

AI fragments specialist translation into routine tasks, effectively de-skilling the profession. This shift reduces compl...

Zurich Insurance Expands Data-Center Offering Beyond the US
Bloomberg

Zurich Insurance Expands Data-Center Offering Beyond the US

Zurich Insurance Group is expanding its data center insurance products internationally, extending coverage beyond the Un...

Emerging-Market Stocks Fall as Broadcom Miss Disrupts AI Trade
Bloomberg

Emerging-Market Stocks Fall as Broadcom Miss Disrupts AI Trade

Broadcom’s earnings miss triggered a sell-off in AI stocks, dragging down emerging-market equities. This disruption high...

Revolut Co-Founder, CTO Vlad Yatsenko to Step Down From Role
Bloomberg

Revolut Co-Founder, CTO Vlad Yatsenko to Step Down From Role

Revolut co-founder and CTO Vlad Yatsenko is stepping down from his executive role. The resignation marks a significant l...