arXiv

Qift: Shift-Friendly No-Zero W2 Post-Training Quantization for Rotated W2A4/KV4 LLM Inference

Title: Qift: A Zero-Free W2 Post-Training Quantization Approach for Efficient Rotated W2A4/KV4 LLM Inference

Abstract:

While two-bit weight quantization offers significant advantages for memory-efficient large language model (LLM) inference, the conventional W2 level set—defined as {-2, -1, 0, +1}—frequently suffers from performance degradation under demanding W2A4/KV4 configurations. This study investigates the geometric properties of two-bit weight level sets within a quantization framework utilizing Hadamard rotation. Our findings indicate that standard asymmetric W2 quantization yields substantial improvements over the traditional level set, suggesting that the limitations of W2A4 stem not merely from bit-width constraints but also from issues related to reconstruction accuracy.

An analysis of 224 linear modules in both LLaMA-2-7B and LLaMA-3.1-8B reveals that pretrained weights are already nearly zero-centered. Furthermore, applying Hadamard rotation effectively Gaussianizes their standardized distribution, drastically reducing excess kurtosis and Q-Q error by several orders of magnitude. Leveraging this approximate zero-centered, Gaussian-like source model, we introduce Qift, a training-free, fixed no-zero W2 level set designed for rotated W2A4/KV4 inference. The primary level set is defined as {+/-0.5, +/-1.5}, which corresponds to {+/-1, +/-3} under a half-scale reparameterization. Alternatively, a power-of-two variant employs {+/-1, +/-4} to facilitate sign-and-shift decoded weight application.

Qift eliminates the need for learned codebooks, group grids, zero points, or redesigns of the fixed two-bit code-to-level mapping, while maintaining standard per-channel scaling. Through scale-invariant ratio analysis, we identify an optimal inner-to-outer centroid ratio range of 0.25 to 0.33. This insight clarifies the superior performance of methods such as mirror no-zero (MNZ), Lloyd, NF2, and PoT-MNZ, while explaining the inefficacy of the {+/-1, +/-2} set.

Experimental results across both models demonstrate that these no-zero level sets consistently enhance perplexity metrics for pure W2A4, mixed W2/W4 configurations across L layers, downstream accuracy, and GPTQ residual behavior compared to the standard W2 approach. Specifically, at a mixed precision setting of L=16, these sets significantly reduce the performance gap relative to W3A4, all while preserving two-bit precision for half of the transformer layers. Consequently, Qift provides a straightforward, source-aware, and deployment-ready alternative to more complex learned W2 codebooks.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...