Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses
Title: Harness-1: Leveraging Reinforcement Learning for Search Agents via State-Externalizing Harnesses
Abstract:
Training search agents typically involves developing policies over expanding transcripts, a process where the model is tasked with navigating search strategies while simultaneously tracking observed data, identifying relevant evidence, monitoring open constraints, and verifying claims. We contend that this approach burdens the policy with excessive routine state management. Consequently, reinforcement learning is compelled to optimize both semantic search choices and recoverable bookkeeping tasks that the environment could handle more reliably. To address this, we present Harness-1, a 20-billion-parameter search agent (functioning as a retrieval subagent) trained using reinforcement learning within a stateful search harness. This harness manages environment-side working memory, encompassing a candidate pool, a curated set tagged with importance levels, compact evidence links, verification logs, compressed and deduplicated observations, and budget-conscious context rendering. The policy continues to oversee semantic decisions, such as determining search queries, selecting documents for retention or discard, deciding what requires verification, and identifying the appropriate time to conclude. In evaluations across eight retrieval benchmarks covering the web, finance, patents, and multi-hop question answering, Harness-1 attained an average curated recall of 0.730. This performance surpasses the leading open search subagent by 11.4 points and remains competitive against significantly larger frontier models. The improvements were particularly pronounced on held-out transfer benchmarks, indicating that reinforcement learning applied to explicit search state can yield retrieval behaviors that generalize effectively beyond the domains used during training. Our code is publicly accessible at https://github.com/pat-jj/harness-1.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




