arXiv

From Agent Traces to Trust: Evidence Tracing and Execution Provenance in LLM Agents

Title: Bridging the Gap from Agent Traces to Trust: A Framework for Evidence Tracing and Execution Provenance in LLM Agents

Abstract

As Large Language Model (LLM)-based agents grow in sophistication, they increasingly tackle complex challenges by engaging with external tools, retrieval systems, memory structures, environments, and other agents. While this connectivity enhances autonomy, it simultaneously complicates the verification, debugging, and auditing of agent behaviors. Relying solely on final-answer accuracy is insufficient to elucidate the generation process, identify the evidence backing specific claims, justify tool utilization, trace the impact of memory on subsequent decisions, or pinpoint the source of execution failures. Evidence tracing and execution provenance fill this critical void by mapping the connections among retrieved evidence, tool outputs, memory entries, environmental observations, intermediate assertions, actions, and final responses throughout the agent’s lifecycle.

This survey offers a comprehensive review and a conceptual framework dedicated to evidence tracing and execution provenance within LLM agents. We structure the existing literature through a unified provenance lens, linking retrieval grounding, claim support, tool-use safety, memory lineage, observability, debugging, auditing, and recovery. Our proposed taxonomy categorizes trace sources, evidence and execution units, provenance relations, tracing granularity and timing, representation formats, and trust functions.

We examine several key methodological areas, including provenance representation, evidence attribution, tool-use provenance, runtime guardrails, provenance-bearing memory, trace-based observability, and failure diagnosis. Additionally, we align current benchmarks, datasets, and evaluation metrics with provenance-related capabilities, advocating for a shift in evaluation standards from mere final-answer correctness to process-level accountability. The paper concludes by highlighting open challenges, such as the development of unified trace schemas, claim-level and semantic provenance, provenance-aware safety mechanisms, realistic execution-trace benchmarks, recovery-oriented evaluation methods, and privacy-conscious audit infrastructure.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

The Do’s and Don’ts of Buying Used Tech Gadgets
New York Times

The Do’s and Don’ts of Buying Used Tech Gadgets

Refurbished tech offers a cost-effective alternative amid component shortages and inflated prices. This guide outlines e...

Who is Elon Musk and what is his net worth?
BBC News

Who is Elon Musk and what is his net worth?

Elon Musk, CEO of Tesla and SpaceX, became the first person to surpass a $500 billion net worth in October 2025. His wea...

AI Boom Propels China Optical Maker to Top Weighting on CSI 300
Bloomberg

AI Boom Propels China Optical Maker to Top Weighting on CSI 300

Driven by surging AI demand, a Chinese optical maker has reached the highest weighting in the CSI 300 index.

AI Bubble 'Something to Look At,' BNP's Huynh Says (Video)
Bloomberg

AI Bubble 'Something to Look At,' BNP's Huynh Says (Video)

BNP Paribas’ Huynh describes the AI bubble as “something to look at,” signaling cautious interest in the sector’s potent...

SoftBank’s PayPay to Buy T&D’s Life Insurer for $840 Million
Bloomberg

SoftBank’s PayPay to Buy T&D’s Life Insurer for $840 Million

PayPay is acquiring T&D Holdings’ life insurer for $840 million, shortly after its historic $879.8 million Nasdaq IPO.

Goldman Sachs CEO David Solomon on Running a Bank in the Age of AI | Odd Lots
Bloomberg

Goldman Sachs CEO David Solomon on Running a Bank in the Age of AI | Odd Lots

Goldman Sachs CEO David Solomon discusses integrating AI into banking operations. He explores how artificial intelligenc...