arXiv

WUSH: Near-Optimal Adaptive Transforms for LLM Quantization

Title: WUSH: Near-Optimal Adaptive Transforms for LLM Quantization

Abstract:

While quantizing both weights and activations is a widely adopted strategy to enable efficient large language model (LLM) deployment, the presence of severe outliers can expand the dynamic range, thereby exacerbating quantization errors in low-bit formats. Existing transform-based solutions, such as Hadamard rotations, are static and independent of data characteristics, leaving their true optimality for quantization tasks uncertain. In this work, we establish closed-form optimal linear blockwise transforms designed for joint weight-activation quantization using standard RTN AbsMax-scaled block quantizers, applicable to both integer and floating-point representations. The proposed method, WUSH, integrates a Hadamard foundation with a data-driven second-moment component to create a non-orthogonal transform. Under mild assumptions, this approach is proven to be near-optimal for both FP and INT quantizers and supports an efficient, fused implementation on GPUs. Empirical evaluations demonstrate that WUSH significantly enhances W4A4 accuracy compared to leading Hadamard-based baselines; for instance, on Llama-3.1-8B-Instruct using MXFP4, it yields an average improvement of +2.8 points with RTN and +0.7 points with GPTQ. Additionally, the method achieves up to 5.8$\times$ higher per-layer throughput than BF16 through FP4 MatMul operations. The source code is publicly accessible at https://github.com/IST-DASLab/WUSH.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...