Your Autoregressive Model Already Reveals the Causal Graph
Title: Your Autoregressive Model Already Reveals the Causal Graph
Abstract: By leveraging the fact that autoregressive models trained through next-token prediction inherently capture the conditional independence structure of their underlying data-generating processes, we enable scalable causal discovery from a solitary observed sequence of discrete events, eliminating the need for task-specific retraining. This single-stream scenario is common in domains such as patient trajectories, manufacturing systems, and vehicle diagnostics, yet it presents significant challenges: the lack of repeated samples, extensive event vocabularies, and long-range temporal dependencies often cause existing techniques to be either computationally prohibitive or inaccurate. To address this, we present TRACE, a framework that utilizes any pretrained autoregressive model as a density estimator for conditional mutual information, the core component of conditional independence testing. TRACE performs parallelized conditional independence tests on GPUs, allowing it to recover both the summary projection and the sample-level time causal graph. The method scales linearly with vocabulary size and effectively accounts for delayed causal effects. Furthermore, we demonstrate that minimizing the standard cross-entropy loss used in pretraining directly reduces an upper bound on causal identification error, revealing a duality between sequence prediction and causal discovery. In evaluations involving nonlinear structural causal models (with a vocabulary size of 8,000) and real-world vehicle diagnostic logs (vocabulary size of 29,100), TRACE emerges as the first viable solution at this scale, surpassing the best-performing baseline by more than 20 F1 points.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



