Adaptive Auto-Harness: Sustained Self-Improvement for Agentic System Deployment on Open-Ended Task Streams
Title: Adaptive Auto-Harness: Enabling Sustained Self-Improvement for Agentic Systems in Open-Ended Task Environments
Abstract:
While auto-harness platforms like Meta-Harness, A-Evolve, and GEPA effectively enhance Large Language Model (LLM) agents by refining prompts, tools, memories, skills, and infrastructure based on execution feedback, their validation typically relies on static, offline benchmarks. In contrast, real-world deployments involve open-ended task streams characterized by unbounded historical growth, varying task heterogeneity that demands specialized harnesses, and shifting problem distributions. These dynamics render a single, densely updated harness brittle, often leading to performance degradation after an initial accuracy peak. To address this, we propose sustained harness construction through task-specific adaptation. We present Adaptive Auto-Harness, a novel framework and system designed for such dynamic streams. Theoretically, the framework breaks down the distance to an optimal oracle harness into two components: evolution loss and adaptation loss. Practically, the system mitigates these losses using a stateful multi-agent evolver, a harness tree equipped with solve-time routing, and human-in-the-loop steering mechanisms for scenarios where historical data lacks sufficient signal. Empirical results across prediction markets, security competitions, and event forecasting streams demonstrate that Adaptive Auto-Harness surpasses five existing auto-harness baselines. Ablation studies indicate that these improvements stem from superior construction methods, effective routing, and precise human steering. The source code is publicly accessible at https://github.com/A-EVO-Lab/AdaptiveHarness.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




