arXiv

Locality Does Not Imply Reachability: Boundary Repair in Block-Sparse Causal Attention

June 3, 2026 · Zhibo Yang · Original Source

Title: Locality Does Not Imply Reachability: Boundary Repair in Block-Sparse Causal Attention

Abstract:

Sparse causal attention mechanisms are typically justified by the principle of sequence locality, which posits that proximal tokens should remain readily accessible while distant ones can be discarded to mitigate computational costs. However, this study highlights a critical discrepancy between sequence locality and the actual reachability of tokens within the attention graph. In architectures employing fixed block causal attention, adjacent tokens may become effectively disconnected in the attention graph at every layer depth.

We characterize this "boundary artifact" using structural dependency sets. Our analysis demonstrates that if every attention layer adheres to an identical fixed block causal mask and all other operations are positionwise, a target representation is restricted to depending solely on tokens within its own block prefix. This limitation results in an architecture-level boundary-copy separation when evaluated against a constructed K-way boundary-copy distribution, leading to a theoretical upper bound on top-1 accuracy of 1/K and a lower bound on expected cross-entropy of log K.

To address this, we derive phase-conditioned coverage functions that reveal how reachability is determined by both the distance between source and target tokens and the target’s specific offset within its block. These coverage laws serve as predictive tools for identifying when sparse patterns are likely to fail, when boundary repairs will be beneficial, and why sliding-window attention cannot simply replace boundary repair techniques.

We introduce Boundary Bridge Attention as a constructive solution. This method maintains the fixed block path while introducing zero-additional-parameter auxiliary causal edges near block boundaries via shared projections. In controlled experiments involving 1024-token sequences, performance improvements were observed primarily in diagnostics aligned with coverage metrics. Furthermore, as secondary evidence of external validity, a probe using a fixed checkpoint on the 8K-token Qwen2.5-7B model exhibited the same pattern of coverage incomparability.

The primary contribution of this work is a theory-driven diagnostic framework designed to address the mismatch between locality and reachability in block-sparse causal attention. This framework is complemented by phase-conditioned coverage analysis and a minimal, effective constructive repair mechanism.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC