Scaling Agentic Capabilities via Grounded Interaction Synthesis
Title: Scaling Agentic Capabilities via Grounded Interaction Synthesis
Original: arXiv:2606.02001v1 Announce Type: new Abstract: General agentic intelligence hinges on the ability to interact with diverse real-world tools to complete complex tasks, a capability fundamentally tied to the quality of interaction data. To bypass the prohibitive costs of human annotation, prevailing paradigms depend entirely on Large Language Models (LLMs) to scale the synthesis of agentic environments and tasks. However, such unconstrained generation often degenerates into biased random sampling of LLMs' internal priors, failing to capture the diversity and difficulty of real-world domains or construct high-fidelity, long-horizon tasks. In this work, we introduce Grounded Agentic Interaction Synthesis (GAIS), a framework that automates the scalable construction of diverse environments and complex tasks via a two-phase grounding mechanism. Specifically, we construct protocol-anchored environments derived from real-world Model Context Protocol (MCP) servers to ensure functional diversity and difficulty. Subsequently, we employ structure-guided planning to navigate these environments, actively enforcing logical dependencies and adversarial policies to generate complex tasks. Experiments on BFCL, $\tau^2$-Bench, and ACEBench demonstrate that GAIS-synthesized data significantly outperforms state-of-the-art baselines, enabling base models to match or even surpass their official instruction-tuned counterparts. Furthermore, GAIS exhibits superior data efficiency and scalability, achieving exceptional capabilities with significantly less data while maintaining continuous growth where baselines stagnate. Our code and dataset are publicly available at https://github.com/Eric8932/GAIS.
Rewrite: Title: Scaling Agentic Capabilities via Grounded Interaction Synthesis
Abstract: The development of general agentic intelligence relies heavily on the capacity to engage with a wide array of real-world tools to execute intricate tasks, a skill set that is intrinsically linked to the quality of interaction data. To avoid the exorbitant expenses associated with human labeling, current approaches rely solely on Large Language Models (LLMs) to expand the creation of agentic environments and tasks. Nevertheless, this unrestricted generation process frequently deteriorates into a biased, random sampling of the LLMs' inherent biases, which fails to reflect the variety and challenge of actual domains or to build high-fidelity, long-term tasks. In this study, we present Grounded Agentic Interaction Synthesis (GAIS), a framework designed to automatically and scalably create diverse environments and complex tasks through a two-stage grounding process. First, we develop protocol-based environments sourced from live Model Context Protocol (MCP) servers to guarantee functional variety and challenge. Next, we utilize structure-driven planning to traverse these environments, deliberately imposing logical constraints and adversarial strategies to produce intricate tasks. Evaluations on BFCL, $\tau^2$-Bench, and ACEBench reveal that data synthesized by GAIS substantially exceeds the performance of current state-of-the-art baselines, allowing base models to equal or exceed the performance of their officially instruction-tuned versions. Additionally, GAIS demonstrates enhanced data efficiency and scalability, delivering outstanding results with considerably less data and sustaining ongoing improvement, whereas baselines tend to plateau. The code and dataset are accessible at https://github.com/Eric8932/GAIS.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





