SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction
Title: SkillHarm: Automated Construction of Lifecycle-Aware Skill-Based Attacks
Abstract
Agent skills hold a critical and privileged position within the agent workflow, as agents are designed to implicitly follow and execute them. This reliance renders third-party skills a significant and vulnerable attack surface. While prior research has identified unsafe agent behaviors resulting from skill-based attacks, these studies have largely focused on evaluating poisoned skills during single task executions and categorizing harms using ad-hoc risk lists.
To address these limitations, we present SkillHarm, a comprehensive benchmark for skill-based attacks spanning the entire skill-use lifecycle, accompanied by a systematic taxonomy of skill-related risks. SkillHarm assesses two distinct attack scenarios: Fixed-Payload Poisoning (FPP), in which a static, poisoned skill package directly compromises any task session that utilizes it; and Self-Mutating Poisoning (SMP), where an initially harmless execution silently alters persistent skill content, delaying the onset of harm until the skill is reused. Furthermore, we define 12 specific risk types based on the agent workflow component targeted by the damage, including data pipelines, system environments, and agent autonomy.
To generate these attacks at scale, we developed AutoSkillHarm, an automated construction pipeline that employs coding agents guided by natural-language harnesses. This process resulted in a benchmark comprising 879 attack samples across 71 skills. Our experiments demonstrate that current agents remain susceptible, with attack success rates reaching 86.3% for FPP and 69.3% for SMP. Additionally, our analysis uncovers a latent risk: many instances of apparent attack failure are due to the agent’s inability to engage with the poisoned file, rather than genuine defensive resistance. Moreover, existing defenses continue to fail in reliably mitigating these threats.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





