Causal Evidence of Stack Representations in Modeling Counter Languages Using Transformers
Title: Causal Evidence of Stack Representations in Modeling Counter Languages Using Transformers
Abstract: Formal languages have long served as effective tools for probing the internal mechanisms of transformer architectures. Previous research has demonstrated that when transformers are trained to predict the next token in counter languages, they develop representations that align with an underlying stack structure. Moving beyond mere representational analysis, this study examines the causal function of these specific representations. We trained linear probes to estimate stack depth at each token based on the model’s hidden states and identified a principal representation direction from the resulting probe. When this specific direction was ablated from the model, sequential accuracy plummeted to near 0%. This outcome provides robust empirical evidence that the stack representation is not merely a byproduct of learning but is causally essential for the model’s performance.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



