arXiv

Task Structure Reverses Layerwise State Encoding in Sequence Models

Title: Task Dependency Reverses Layerwise State Encoding Patterns in Sequence Models

Abstract:

Mechanistic analyses of sequence models typically characterize layerwise state encodings as fixed architectural features, noting that recurrent architectures tend to concentrate readable states while attention-based models distribute them. However, our findings demonstrate that this profile is not static; it reverses depending on the specific task. By examining Transformers, Mamba, Mamba-2, LSTMs, and GRUs, we observed that for the Parity task, state encoding is concentrated late in Mamba and recurrent baselines, whereas it builds gradually in Transformers. This pattern inverts for the bounded-depth Dyck-k task. Similar reversals occur in fine-tuned Mamba-130M and Pythia-160M models, with the Pythia Dyck bottleneck remaining evident even at the 410M parameter scale.

The literature often conflates two distinct explanations for these behaviors: algebraic structure (specifically commutativity) and computational structure (distinguishing between prefix updates and stack-like mechanisms). To disentangle these factors, we introduced a third task involving non-commutative S_3 permutation composition. Probing across all five architectures and Mamba-specific Conv1D attribution revealed that S_3 groups with Parity rather than Dyck. This alignment indicates that layerwise probing tracks computational structure rather than commutativity.

Causal interventions on 4-layer formal models reveal that linearly readable directions are often functionally critical and retain significance even at out-of-distribution lengths for both Parity and Dyck tasks. However, the dynamics change at pretrained scales. Fine-tuned Pythia models exhibit a strong bottleneck in middle layers; ablating layers L6-L7 in the 160M model reduces accuracy by approximately 81%, while a broader plateau spanning L4-L18 persists at 410M, despite the effect being weaker at the best-probed layer. In contrast, pretrained Mamba models display a complementary failure mode: while their final layer is highly readable, no single probe direction breaks the task on Parity, Dyck, or S_3. Instead, mid-position activation patching in the final layer recovers about 97-98% of the clean-corrupted logit gap. These results suggest that probing identifies where state is linearly accessible, which does not always coincide with where computation is bottlenecked. Ultimately, mechanistic signatures emerge from the interaction between architecture and task.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...