Detection vs. Execution: Single-Bucket Probes Miss Half the Mamba-2 State Sink
Title: Detection Versus Execution: Single-Bucket Probes Overlook Half of Mamba-2’s State Sink
Abstract:
Mechanistic interpretability frequently operates under the premise that a probe capable of identifying a specific representational signature also isolates the circuit responsible for the corresponding computation. However, we demonstrate that this assumption breaks down systematically within the Mamba-2 architecture. By investigating the "state sink"—characterized by disproportionate Delta-gate activation on boundary tokens, similar to the attention sink—we reveal that single-bucket probes capture only a minor execution layer while overlooking a significantly larger detection layer that shares the same representational signature.
In Mamba-2, the state sink splits into two distinct functional groups of heads. BOS-specialist heads, which constitute approximately 5% of the heads in the 2.7B model, are causally responsible for both BOS-context and newline-target predictions across various model scales and datasets. In contrast, dual heads, which account for 27–35% of the heads and are identified through multi-class aggregation of the same probe, exhibit stronger representational similarity between BOS and newline tokens but demonstrate substantially weaker causal influence when subjected to ablation. This finding underscores that representational similarity does not equate to functional equivalence.
This distinction has critical implications for downstream performance: ablating BOS-specialist heads causes RULER NIAH retrieval accuracy to plummet from 1.00 to 0.00 at a context length of 1024 in both Mamba-1 (2.8B) and Mamba-2 (2.7B). Conversely, ablating size-matched complementary heads leaves baseline performance intact. A random channel-bucketing control helps rule out substrate granularity as the sole factor, pointing instead to Mamba-2’s head-shared Delta projection as a key element. Ultimately, while probe-derived specialty can identify execution circuits, the same probe at coarse granularity also recovers detection circuits; distinguishing between the two requires class-conditional ablation rather than class-conditional cosine similarity.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




