arXiv

Self-Healing Agentic Orchestrators for Reliable Tool-Augmented Large Language Model Systems

June 2, 2026 · Rahul Suresh Babu, Adarsh Agrawal · Original Source

Title: Self-Healing Agentic Orchestrators for Reliable Tool-Augmented Large Language Model Systems

Abstract:

The reliability of tool-augmented large language model (LLM) agents hinges on orchestration layers that manage complex operations including planning, retrieval, tool invocation, validation, memory management, and error recovery. In these environments, system failures stem not merely from model inaccuracies but also from orchestration-level complications such as tool timeouts, malformed arguments, outdated context, conflicting evidence, infinite retry loops, and unverified intermediate outputs. This study introduces a self-healing agentic orchestrator that frames reliability as a bounded runtime control challenge. The proposed system translates observable failure indicators into inferred failure categories, executes targeted recovery actions within defined budgets, validates the resulting trajectories, and logs observability traces.

We assessed this methodology using a controlled fault-injection benchmark comprising 100 tasks, comparing it against baselines that utilize static workflows, simple retries, ReAct-style processing, and complete replanning. The self-healing approach attained a task success rate of 98.8%, surpassing the 94.5% achieved by retry-only methods and the 93.8% rate of full replanning. A sweep of recovery budgets revealed that self-healing consistently outperformed both retry-only and full replanning strategies across all tested limits. The most significant performance disparity occurred when limited to a single recovery attempt, where self-healing succeeded in 94.0% of cases, compared to 85.3% for retry-only and 88.2% for full replanning.

In a controlled setting involving semantic silent failures, verifier-guided self-healing eliminated such failures entirely (0.0%), whereas non-verifying baselines frequently produced incorrect yet plausible outputs. Additionally, a compact model-in-the-loop validation demonstrated that the recovery mechanism remains effective even when a live model handles tool selection, argument generation, and answer synthesis using locally fault-injected tools. These findings offer controlled evidence that orchestration strategies incorporating failure awareness, budget constraints, and verification guidance enhance both the reliability and diagnosability of tool-augmented LLM systems.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC