When Do Attention Circuits Form? Developmental Trajectories of Capability and Attention-Sink Emergence Across Three 1B-ClassArchitectures
Title: Tracing the Emergence of Attention Circuits: Developmental Pathways in Capability and Attention-Sink Formation Across Three 1B-Scale Architectures
Abstract: This study maps the developmental trajectory of attention-head circuit formation across three 1B-class language models, encompassing two architectural families (dense transformers and mixture-of-experts) and two pretraining datasets (The Pile and DCLM): Pythia 1B, OLMo 1B-0724-hf, and OLMoE 1B-7B-0924. We conducted 30 mechanistic-interpretability runs in total, applying a participation-ratio (PR) spectral signal and an all-head capability-specific selectivity screen at each of 10 log-spaced checkpoints per model. This approach allowed us to monitor the emergence of induction, previous-token, and BOS-attractor heads. Our analysis yields five key findings. First (F1), Layers 0 and 1 consistently produce zero BOS-classified heads across all revisions and models, indicating that this "L0/L1 zero-BOS floor" is an inherent architectural feature rather than a learned result. Second (F2), the fraction of whole-model BOS-attractors exhibits three distinct emergence patterns: a gradual ramp in Pythia 1B, a sharp phase transition in OLMo 1B (jumping from 7% to 70% between adjacent checkpoints), and a gradual ramp in OLMoE 1B-7B. Third (F3), in models trained on DCLM, the formation of induction circuits precedes that of BOS-attractors by a factor of 10 to 20 in terms of token count. Furthermore, capability-circuit formation and attention-sink formation represent two separate transitions rather than a single event. Fourth (F4), the capability-specific screen identifies the final induction circuit within just 0.3% to 2% of total training tokens, demonstrating that circuit identification does not necessitate access to the final model. Fifth (F5), for every final-checkpoint induction head sampled across the three models, the per-head PR was elevated at or before the first revision where the head crossed its capability-selectivity threshold. These results refine the understanding of the induction phase transition: in 1B-class models trained on DCLM, the induction transition and the attention-sink transition are separated by an order of magnitude in token count and display qualitatively different shapes.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




