arXiv

Beyond False Stability: High-Noise Drift Gating for Test-Time Adversarial Defenses in Vision-Language Models

Title: Moving Past Illusory Stability: High-Noise Drift Gating for Test-Time Adversarial Protection in Vision-Language Models

Abstract

While Vision-Language Models (VLMs) like CLIP demonstrate impressive zero-shot generalization capabilities, they remain exceptionally susceptible to adversarial attacks. Although adversarial training can enhance robustness, its high computational cost has driven interest in test-time defense strategies. Current methods typically leverage the behavior of CLIP’s visual representations under stochastic perturbations. These techniques include aggregating predictions from multiple noisy views, creating Gaussian noise-averaged anchors to interpolate features toward, or applying counter-perturbations. While these approaches bolster robustness, they frequently come at the expense of clean accuracy, resulting in a suboptimal balance between the two.

This study re-examines stochastic test-time defenses by identifying a previously overlooked transition in the noise regime within CLIP’s representation space. Previous research has primarily focused on the weak-noise regime, a context where adversarial examples can exhibit misleading stability, or "false stability." Our analysis reveals that this dynamic inverts as perturbation intensity increases. Beyond the weak-noise threshold, adversarial representations become significantly more unstable compared to clean ones, providing a more distinct separation signal. This transition phenomenon proves robust across various conditions, including uniform and Gaussian noise, photometric and geometric transformations, different datasets, and diverse attack vectors. Notably, this effect largely vanishes in models trained with adversarial techniques, suggesting a link to the fragile local-basin geometry inherent to non-robust CLIP models.

To address this, we introduce a training-free, plug-in drift-gated mechanism. This system utilizes feature drift observed under high-noise conditions as a lightweight gating signal, activating existing test-time defenses only when adversarial-like instability is identified. Evaluated across 13 datasets, our method consistently enhances the clean-robust accuracy trade-off. Specifically, on eight fine-grained datasets, the mean accuracy for combined clean and adversarial samples increased from 65.7% to 71.4% for counterattack defenses, and from 68.4% to 73.2% for noise-anchoring methods. Similarly, on ImageNet and four shifted variants, performance improved from 56.1% to 66.2% and from 62.1% to 67.6%, respectively.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...

Google Ordered to Make Changes to AI Search Summaries by UK
Bloomberg

Google Ordered to Make Changes to AI Search Summaries by UK

The UK has ordered Google to modify its AI search summaries. This mandate aims to ensure greater accuracy and transparen...

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...