TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches
Title: TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches
Abstract:
Vision-Language-Action (VLA) models have exhibited significant prowess in robotic manipulation tasks, largely due to the integration of Chain-of-Thought (CoT) reasoning, which enhances both interpretability and generalization. Despite these advancements, the security implications of CoT-based reasoning mechanisms have received minimal attention. This study reveals that CoT reasoning creates a new vulnerability for targeted behavior hijacking. Specifically, it is possible to force a robot to execute incorrect actions—such as handing a knife to a person rather than an apple—without altering the original user command.
Our empirical analysis confirms that CoT plays a dominant role in directing action generation, even when the reasoning process is semantically disconnected from the input instructions. Leveraging this insight, we introduce TRAP, the inaugural targeted adversarial attack designed to compromise CoT-reasoning VLA models. TRAP exploits the pathway from reasoning to action by employing an adversarial patch, such as a specially designed tablecloth, to manipulate intermediate CoT steps and steer subsequent actions toward malicious, adversary-specified outcomes.
We conducted extensive evaluations across three prominent reasoning VLAs, each utilizing distinct CoT mechanisms, confirming TRAP’s efficacy. In a real-world demonstration, we successfully implemented the attack using a patch printed on standard paper. These results underscore the critical necessity of securing CoT reasoning within VLA architectures. Further details and resources can be found at https://zhengxian-huang.github.io/TRAP-website/.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



