Defenses & Enablers For Skill Injection Attacks on Terminal Based Agents
Title: Mitigating Skill Injection Attacks in Terminal-Based Agents: Defensive Strategies and Enabling Mechanisms
Abstract:
As Large Language Model (LLM) agents increasingly integrate reusable skillsādefined as documents outlining specific, task-oriented proceduresāthey inadvertently create new attack vectors that require careful management. This research investigates two complementary approaches to addressing this vulnerability. First, we assess the efficacy of guardian-based defenses, which employ an intermediary LLM agent to mediate access to skill files. These guardians operate in two modes: dynamically, by intervening during runtime, or statically, by pre-rewriting files prior to the build phase. Our evaluation across three distinct LLM agent families demonstrates that these guardians reduce the Attack Success Rate (ASR) by more than 50% without compromising task utility.
Second, we stress-test these defenses against attack reframing techniques, which involve four distinct attacks that maintain the malicious intent while altering the phrasing. In environments without guardians, this reframing technique escalates the ASR to 81.4%. However, the implementation of a dynamic guardian reduces this rate significantly to 18.6%, underscoring the robustness of real-time mediation as a defensive strategy.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




