Task Structure Reverses Layerwise State Encoding in Sequence Models
Title: Task Dependency Reverses Layerwise State Encoding Patterns in Sequence Models
Abstract:
Mechanistic analyses of sequence models typically characterize layerwise state encodings as fixed architectural features, noting that recurrent architectures tend to concentrate readable states while attention-based models distribute them. However, our findings demonstrate that this profile is not static; it reverses depending on the specific task. By examining Transformers, Mamba, Mamba-2, LSTMs, and GRUs, we observed that for the Parity task, state encoding is concentrated late in Mamba and recurrent baselines, whereas it builds gradually in Transformers. This pattern inverts for the bounded-depth Dyck-k task. Similar reversals occur in fine-tuned Mamba-130M and Pythia-160M models, with the Pythia Dyck bottleneck remaining evident even at the 410M parameter scale.
The literature often conflates two distinct explanations for these behaviors: algebraic structure (specifically commutativity) and computational structure (distinguishing between prefix updates and stack-like mechanisms). To disentangle these factors, we introduced a third task involving non-commutative S_3 permutation composition. Probing across all five architectures and Mamba-specific Conv1D attribution revealed that S_3 groups with Parity rather than Dyck. This alignment indicates that layerwise probing tracks computational structure rather than commutativity.
Causal interventions on 4-layer formal models reveal that linearly readable directions are often functionally critical and retain significance even at out-of-distribution lengths for both Parity and Dyck tasks. However, the dynamics change at pretrained scales. Fine-tuned Pythia models exhibit a strong bottleneck in middle layers; ablating layers L6-L7 in the 160M model reduces accuracy by approximately 81%, while a broader plateau spanning L4-L18 persists at 410M, despite the effect being weaker at the best-probed layer. In contrast, pretrained Mamba models display a complementary failure mode: while their final layer is highly readable, no single probe direction breaks the task on Parity, Dyck, or S_3. Instead, mid-position activation patching in the final layer recovers about 97-98% of the clean-corrupted logit gap. These results suggest that probing identifies where state is linearly accessible, which does not always coincide with where computation is bottlenecked. Ultimately, mechanistic signatures emerge from the interaction between architecture and task.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





