arXiv

Toward Training Superintelligent Software Agents through Self-Play SWE-RL

Title: Advancing the Path to Superintelligent Software Agents via Self-Play SWE-RL

Abstract:

Although large language model (LLM)-driven agents utilizing agentic reinforcement learning (RL) have shown potential in enhancing developer efficiency, their current training frameworks face a significant bottleneck. These systems rely heavily on human-curated data, such as GitHub issues and pull requests, and environments based on human-defined test cases (e.g., pass-to-pass or fail-to-pass scenarios). This dependence on human knowledge creates a fundamental obstacle to achieving superintelligence. To address this, we introduce Self-play SWE-RL (SSR), a novel training paradigm designed as an initial step toward developing superintelligent software agents.

SSR operates under minimal data assumptions, requiring only access to sandboxed repositories containing source code and installed dependencies. It eliminates the need for human-labeled issues or tests. By leveraging real-world codebases, a single LLM agent is trained through reinforcement learning in a self-play environment. This process involves the agent iteratively introducing and fixing software bugs of escalating complexity. Crucially, these bugs are formally defined by test patches rather than natural language descriptions.

Evaluations on the SWE-bench Verified and SWE-Bench Pro benchmarks demonstrate that SSR exhibits significant self-improvement, gaining +10.4 and +7.8 points, respectively. Throughout the entire training trajectory, SSR consistently surpasses the baseline established by human-data training, even when tested on natural language issues that were not part of the self-play data. While these findings are preliminary, they indicate a promising direction: agents that can autonomously accumulate vast amounts of learning experiences from real-world software repositories. This approach may ultimately lead to superintelligent systems capable of exceeding human proficiency in understanding system architecture, resolving novel problems, and independently generating new software from the ground up.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...