arXiv

Capability and Robustness Cannot Both Be Free: An Information-Theoretic Bound for Vision-Language-Action Models

Title: The Inevitable Trade-off: An Information-Theoretic Limit on Capability and Robustness in Vision-Language-Action Models

Abstract:

Vision-Language-Action (VLA) models demonstrate high efficacy on unperturbed data yet remain highly vulnerable to minor adversarial interference. For instance, a PGD attack with an intensity of $16/255$ causes the success rate of OpenVLA-7B on the LIBERO benchmark to plummet from $95\%$ to less than $5\%$. While the existence of a theoretical lower bound for this performance trade-off had long been an open question, we establish that such a limit indeed exists. We demonstrate that for any VLA policy, the sum of its capability, defined as $I(\Astar;\Api)$, and its robustness, quantified as $I(\Api;\Atildepi)-I(\Api;\delta)$, is bounded above by $H(\Astar)+I(X;\Xtilde)$. This upper limit represents the sum of task entropy and adversarial channel capacity. The derivation relies on two applications of the Data Processing Inequality.

While the pixel-level bound acts as a loose ceiling guarantee—deviating by approximately $10^3$ nats—an encoder-specific corollary significantly tightens this constraint by more than an order of magnitude. In this tighter regime, realized capability already accounts for $5\%$ to $9\%$ of the total information budget. We empirically validate Theorem~\ref{thm:main}, observing zero violations across 308 distinct test cells. These cells include 252 closed-form Gaussian-VLA configurations, 48 OpenVLA-7B setups tested under LIBERO with PGD attacks (spanning 4 suites, 4 $\eps$ values, and 3 seeds), 4 Square-Attack instances, and 4 multi-step scenarios ($T=10$).

Furthermore, a complementary measurability inequality, $\Rob_{\text{disc}} \le \Cap_{\text{disc}}$, holds true across 144 cross-architecture cells. These comparisons span OpenVLA, OpenVLA-OFT (which uses continuous-$L_1$), and SmolVLA (which employs flow-matching). This analytical framework also yields three label-free diagnostic tools: a pre-flight encoder ceiling, a defense-forensics probe capable of distinguishing between input-side and language-model interventions, and a head-agnostic robustness ratio that allows for consistent comparison across discrete-token, $L_1$-regression, and flow-matching policies. Collectively, these insights provide a unified axis for defense strategies and architecture comparisons, addressing gaps in current methodologies.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...