arXiv

From Agent Traces to Trust: Evidence Tracing and Execution Provenance in LLM Agents

June 4, 2026 · Yiqi Wang, Jiaqi Zhang, Taotao Cai, Zirui Liu, Qingqiang Sun, Zequn Sun, Zhangkai Wu, Mingkai Zhang, Yanming Zhu · Original Source

Title: Bridging the Gap from Agent Traces to Trust: A Framework for Evidence Tracing and Execution Provenance in LLM Agents

Abstract

As Large Language Model (LLM)-based agents grow in sophistication, they increasingly tackle complex challenges by engaging with external tools, retrieval systems, memory structures, environments, and other agents. While this connectivity enhances autonomy, it simultaneously complicates the verification, debugging, and auditing of agent behaviors. Relying solely on final-answer accuracy is insufficient to elucidate the generation process, identify the evidence backing specific claims, justify tool utilization, trace the impact of memory on subsequent decisions, or pinpoint the source of execution failures. Evidence tracing and execution provenance fill this critical void by mapping the connections among retrieved evidence, tool outputs, memory entries, environmental observations, intermediate assertions, actions, and final responses throughout the agent’s lifecycle.

This survey offers a comprehensive review and a conceptual framework dedicated to evidence tracing and execution provenance within LLM agents. We structure the existing literature through a unified provenance lens, linking retrieval grounding, claim support, tool-use safety, memory lineage, observability, debugging, auditing, and recovery. Our proposed taxonomy categorizes trace sources, evidence and execution units, provenance relations, tracing granularity and timing, representation formats, and trust functions.

We examine several key methodological areas, including provenance representation, evidence attribution, tool-use provenance, runtime guardrails, provenance-bearing memory, trace-based observability, and failure diagnosis. Additionally, we align current benchmarks, datasets, and evaluation metrics with provenance-related capabilities, advocating for a shift in evaluation standards from mere final-answer correctness to process-level accountability. The paper concludes by highlighting open challenges, such as the development of unified trace schemas, claim-level and semantic provenance, provenance-aware safety mechanisms, realistic execution-trace benchmarks, recovery-oriented evaluation methods, and privacy-conscious audit infrastructure.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC