arXiv

Forget Attention: Importance-Aware Attention Is All You Need

June 2, 2026 · Soohyeong Shin, Yeongwook Yang · Original Source

Title: Beyond Standard Attention: Leveraging Importance-Aware Mechanisms as the Core Component

Abstract: A persistent hurdle in hybrid language modeling lies in effectively merging the global retrieval capabilities of attention mechanisms with the sequential importance signaling inherent to state space models (SSMs). While Transformers excel at accessing all context simultaneously, they lack inherent prioritization; conversely, SSMs identify critical information but struggle with revisiting past states. Current hybrid approaches, such as Jamba (which operates at the block level) and Hymba (which functions at the head level), isolate these two components into distinct compartments, preventing them from influencing one another during the actual attention computation.

To address this, we introduce SISA (SSM-Informed Softmax Attention), a method that integrates an importance metric derived from SSMs directly into the attention score. This approach executes the entire operation as a single SDPA call utilizing augmented query and key vectors, eliminating the need for recurrent states or custom kernels.

Performance benchmarks reveal significant advantages: at scales of 152 million and 5 billion tokens, SISA achieves a LAMBADA-greedy score of 17.3%, outperforming both the standard Transformer (13.9%) and Mamba-3 (15.5%). Furthermore, SISA reaches 100% Needle-In-A-Haystack (NIAH) retrieval accuracy starting from step 1K, converging 7 times faster than the Transformer. At the 369 million parameter scale, while Mamba-3 leads in LAMBADA performance, SISA maintains perfect NIAH scores while retaining standard SDPA execution efficiency. Consequently, SISA establishes a new "score-level fusion" design axis for SSM-attention hybrids, moving beyond the previously dominant block-level and head-level paradigms.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC