arXiv

PaCo-VLA: Passivity-Shielded Compliance Prior for Contact-Rich Vision-Language-Action Manipulation

June 2, 2026 · Haofan Cao, Zhaoyang Li, Zhichao You, Liang Guo, Tianrui Li · Original Source

Title: PaCo-VLA: A Passivity-Shielded Compliance Prior for Vision-Language-Action Manipulation in Contact-Rich Scenarios

Abstract: Effective manipulation in contact-rich environments requires a dual capability: high-level semantic reasoning and the safe management of high-frequency contact dynamics. Although Vision-Language-Action (VLA) models offer remarkable semantic generalization, their slow output rates render them unreliable for direct control authority in tasks sensitive to force. To address the disconnect between semantics and control, we present PaCo-VLA, an approach that redefines the VLA interface through a passivity-shielded compliance prior. Instead of issuing direct motor commands, PaCo-VLA interprets network outputs as task-level compliance proposals, encompassing semantic bindings, task stages, and admittance schedules. These proposals are regulated by a proposal-independent, high-frequency passivity shield that utilizes energy-tank accounting and boundary checks. This mechanism ensures that invalid, outdated, or unverified model predictions cannot override low-level contact physics. This decoupled design facilitates causal evaluation by separating semantic contributions from geometric shortcuts. Extensive testing in both simulated and real-world connector-insertion scenarios shows that PaCo-VLA outperforms unshielded VLA baselines in precision, maintaining zero passivity violations even during adversarial compliance shifts. The framework establishes a provably sampled-passive runtime contract at the admittance port, offering a viable runtime interface for deploying foundation models in contact-rich domains.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC