TRACE: Trajectory Risk-Aware Compression for Long-Horizon Agent Safety
Title: TRACE: Trajectory Risk-Aware Compression for Long-Horizon Agent Safety
Abstract:
Safety violations in long-horizon LLM agents often involve risk signals that are sparse, delayed, and compositional, allowing them to bypass local moderation within extended trajectories. Current detection methods, which typically operate on a turn-by-turn basis or with short context windows, fail to effectively retain and aggregate this evidence over long durations. To address this limitation, we introduce TRACE (Trajectory Risk-Aware Compression for Long-Horizon Agent Safety), which repositions long-horizon safety detection as a problem of trajectory-level evidence compression.
TRACE employs a Compressor-Reader architecture. The Compressor module encodes the entire trajectory into a condensed latent evidence state, guided by trajectory-level supervision. Subsequently, the Reader evaluates the raw trajectory, utilizing this latent state as a safety reference. This approach facilitates the aggregation of dispersed risk indicators and mitigates the premature loss of critical evidence.
Empirical results demonstrate TRACE’s superiority across ASSEBench, Pre-Ex-Bench, and R-Judge, where it achieved the highest accuracy among all tested backbones, surpassing strong baselines by as much as 12.6 percentage points. Furthermore, on the LongSafety benchmark, TRACE exhibited greater stability, showing less performance degradation as context length increased. Analysis through attention visualizations and case studies indicates that the compressed reference enables the Reader to prioritize risk-critical segments and reconstruct evidence spanning multiple steps. The source code is publicly available at https://github.com/Peregrine123/TRACE_official.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




