arXiv

Customizing the Inductive Biases of Softmax Attention using Structured Matrices

Title: Tailoring Softmax Attention’s Inductive Biases via Structured Matrices

Abstract: The fundamental mechanism of attention relies on a scoring function that projects inputs into low-dimensional query and key vectors, subsequently calculating the dot product for every pair. Although this low-dimensional projection enhances computational efficiency, it often results in information loss for tasks involving intrinsically high-dimensional inputs. Furthermore, standard attention applies a uniform scoring function to all input pairs, failing to incorporate a distance-dependent computational bias that favors neighboring tokens within the sequence. To overcome these limitations, we introduce novel scoring functions built upon computationally efficient, high-rank structured matrices, specifically Block Tensor-Train (BTT) and contiguous Multi-Level Low Rank (MLR) matrices. Our experiments demonstrate that on in-context regression tasks with high-dimensional data, these new scoring functions surpass standard attention across any fixed compute budget. In the realm of language modeling—a domain characterized by locality patterns—our MLR-based approach exhibits superior scaling laws relative to both conventional attention and sliding window variants. We further establish that both BTT and MLR belong to a wider class of efficient structured matrices capable of encoding either full-rank or distance-dependent computational biases, thereby resolving major deficiencies inherent in standard attention. Lastly, we present evidence that MLR attention yields promising outcomes for long-range time-series forecasting.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

AI Concentration Risk Is the Problem: 3-Minutes MLIV
Bloomberg

AI Concentration Risk Is the Problem: 3-Minutes MLIV

The article argues that AI concentration risk, rather than the technology itself, is the primary concern. It highlights ...

Reuters

Foxconn announces strategic collaboration with Intel on next-gen AI infrastructure

Foxconn and Intel announced a strategic partnership to develop next-generation AI infrastructure. This collaboration aim...

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Reuters

Europe's tech 'liberation day'? Computer says not yet

Europe’s expected tech breakthrough remains unrealized, as current systems indicate that a true "liberation day" has not...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.