Toward Training Superintelligent Software Agents through Self-Play SWE-RL
Title: Advancing the Path to Superintelligent Software Agents via Self-Play SWE-RL
Abstract:
Although large language model (LLM)-driven agents utilizing agentic reinforcement learning (RL) have shown potential in enhancing developer efficiency, their current training frameworks face a significant bottleneck. These systems rely heavily on human-curated data, such as GitHub issues and pull requests, and environments based on human-defined test cases (e.g., pass-to-pass or fail-to-pass scenarios). This dependence on human knowledge creates a fundamental obstacle to achieving superintelligence. To address this, we introduce Self-play SWE-RL (SSR), a novel training paradigm designed as an initial step toward developing superintelligent software agents.
SSR operates under minimal data assumptions, requiring only access to sandboxed repositories containing source code and installed dependencies. It eliminates the need for human-labeled issues or tests. By leveraging real-world codebases, a single LLM agent is trained through reinforcement learning in a self-play environment. This process involves the agent iteratively introducing and fixing software bugs of escalating complexity. Crucially, these bugs are formally defined by test patches rather than natural language descriptions.
Evaluations on the SWE-bench Verified and SWE-Bench Pro benchmarks demonstrate that SSR exhibits significant self-improvement, gaining +10.4 and +7.8 points, respectively. Throughout the entire training trajectory, SSR consistently surpasses the baseline established by human-data training, even when tested on natural language issues that were not part of the self-play data. While these findings are preliminary, they indicate a promising direction: agents that can autonomously accumulate vast amounts of learning experiences from real-world software repositories. This approach may ultimately lead to superintelligent systems capable of exceeding human proficiency in understanding system architecture, resolving novel problems, and independently generating new software from the ground up.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



