arXiv

Neural Attention Search Linear: Towards Adaptive Token-Level Hybrid Attention Models

Title: Neural Attention Search Linear: Towards Adaptive Token-Level Hybrid Attention Models

Abstract: Long-context applications have identified the quadratic computational complexity inherent in softmax transformers as a significant performance bottleneck. Linear attention models offer a viable alternative for more efficient sequential processing by compressing past key-value (KV) states into a single hidden representation, which substantially lowers complexity during both training and inference phases. However, the expressive power of these linear models is constrained by the capacity of their fixed-size hidden states. While prior research has suggested interleaving softmax and linear attention layers to balance computational efficiency with model expressivity, the persistent presence of softmax layers continues to limit overall efficiency. To address this, we introduce Neural Attention Search Linear (NAtS-L), a novel framework that integrates both linear and softmax attention operations within a single layer, assigning them to different tokens based on specific criteria. NAtS-L dynamically decides whether a token is suitable for linear attention—specifically those with short-term influence that can be captured in fixed-size states—or requires softmax attention, particularly for tokens containing long-term retrieval information that must be retained for future queries. Through an optimization of Gated DeltaNet and softmax attention configurations across tokens, our results demonstrate that NAtS-L achieves a robust yet highly efficient hybrid architecture at the token level.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...