arXiv

SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space

June 4, 2026 · Zhenyi Shen, Junru Lu, Lin Gui, Jiazheng Li, Yulan He, Di Yin, Xing Sun · Original Source

Title: SSA: Aligning Full and Sparse Attention Outputs in Feature Space

Abstract:

While sparse attention mechanisms mitigate the quadratic computational burden inherent in full self-attention, they are hindered by two primary obstacles. First, the "attention gap" arises when sparse attention is applied to models trained with full attention, leading to performance drops caused by a mismatch between training and inference distributions. Second, the "capability gap" occurs in models trained exclusively with sparse attention; these models suffer from incomplete gradient flow, which inhibits their ability to reach the performance levels of full-attention counterparts. To address these issues, we introduce SSA (Sparse Sparse Attention), a novel training framework that incorporates bidirectional alignment between full and sparse attention outputs. We provide a theoretical analysis demonstrating that the approximation error is linearly proportional to the amount of attention mass discarded during sparse processing, and we show that SSA’s alignment objective significantly minimizes this error relative to baseline methods. Empirical results indicate that SSA delivers state-of-the-art results across both inference modes, adapts effectively to different sparsity constraints, and exhibits enhanced capabilities for handling long contexts.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Bloomberg

Glazer Family Members Said to Study Manchester United Stake Sale

June 4, 2026

Reports indicate the Glazer family is evaluating a potential sale of their Manchester United stake, with family members ...

Bloomberg

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines

June 4, 2026

Ares’ Blair Jacobson argues that private credit headlines misrepresent reality, highlighting a disconnect between media ...

Bloomberg

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion

June 4, 2026

Nvidia-backed robotics startup Generalist AI has reached a $2 billion valuation. Founders Pete Florence, Andy Zeng, and ...

TechCrunch

Oura Ring 5 review: Thinner, lighter, better

June 4, 2026

The Oura Ring 5 is 40% smaller and lighter than its predecessor, offering superior comfort and a discreet, jewelry-like ...

Financial Times

How AI has de-skilled translation

June 4, 2026

AI fragments specialist translation into routine tasks, effectively de-skilling the profession. This shift reduces compl...

Bloomberg

Zurich Insurance Expands Data-Center Offering Beyond the US

June 4, 2026

Zurich Insurance Group is expanding its data center insurance products internationally, extending coverage beyond the Un...

Top International News

SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space

Related Articles

Glazer Family Members Said to Study Manchester United Stake Sale

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion

Oura Ring 5 review: Thinner, lighter, better

How AI has de-skilled translation

Zurich Insurance Expands Data-Center Offering Beyond the US