arXiv

AIP: A Graph Representation for Learning and Governing Agent Skills

June 4, 2026 · Zachary Blumenfeld, Jim Webber · Original Source

Title: AIP: A Graph Representation for Learning and Governing Agent Skills

Original: arXiv:2606.04781v1 Announce Type: new

Abstract:

Currently, agent skills are predominantly defined in free-form prose, forcing the agent to read, interpret, and re-derive action strategies from scratch in every session. This approach incurs two significant, compounding drawbacks: it undermines reliability for tasks heavy on implementation, and it complicates the creation and refinement of skills. Editing prose is an unstable process that proves difficult for both humans and agents, especially when dealing with domain-specific procedural knowledge that is scarce in model training data.

The Agent Instruction Protocol (AIP) resolves these issues by representing skills as directed execution graphs. In this model, discrete steps serve as nodes, supported by either deterministic scripts or natural-language descriptions. These nodes are linked by explicit, typed input/output edges and are regulated by a schema-validated YAML specification. A meta-skill compiler facilitates the translation of existing human-authored skills into this structured format.

The advantages of this approach are dual in nature. First, converting human-written skills into AIP format significantly enhanced performance. Across 27 real-world agent tasks from SkillsBench, Claude Sonnet’s mean task reward increased from 0.60 to 0.71, while the pass rate rose from 53% to 67%. This improvement was statistically significant (Wilcoxon signed-rank p = 0.011), with the model winning 12 tasks, losing 2, and tying in 13 instances, often achieving these results in less wall-clock time. By providing the agent with vetted, executable units rather than requiring it to deduce code, commands, and tool calls from natural language, the graph structure ensures greater efficiency and accuracy.

Second, the AIP framework simplifies skill creation and improvement. Because each skill is schema-validated, functionally testable, and composed of addressable nodes, failures can be identified and corrected with precision. For instance, two failures in authored skills were traced directly to the script level. Following adjustments to the AIP specification and recompilation, both skills recovered without any regressions—one task seeing a perfect score improvement from 0/5 to 5/5. This process transforms skill improvement into a measurable tuning loop, moving away from the uncertainty of prose rewriting. Furthermore, this graph structure facilitates corpus-level governance and skill introspection, while also offering a natural action space for applying reinforcement learning to skill management.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC