arXiv

GoQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization

Title: GoQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization

Abstract:

The integration of Large Language Models (LLMs) and Vision Transformers (ViTs) into edge computing environments faces substantial hurdles due to memory constraints and the severe timing bottlenecks caused by dense Multiply-Accumulate (MAC) arrays. In the ultra-low bit domain, logarithmic Power-of-Two (PoT) quantization emerges as a hardware-efficient solution by substituting MAC operations with bit-shifts. Nevertheless, the non-uniform exponential lattice is fundamentally restricted by a Low Angular Resolution Regime. This structural deficiency is especially acute at sub-4-bit levels, resulting in significant degradation of high-dimensional feature manifolds.

To overcome this geometric constraint, we introduce Geometric Orthogonal Residual Projection Quantization (GoQuant), a framework that integrates algorithmic and hardware design. GoQuant treats quantization as a dual-basis geometric projection, adaptively constructing a higher-resolution residual lattice exclusively through shift-and-add operations. Additionally, its analytical solver serves as a practical substitute for resource-heavy gradient-based optimization, slashing the full-model calibration time for LLaMA-2-7B to roughly 15 minutes.

Comprehensive evaluations highlight GoQuant’s versatility across different modalities and its hardware efficiency. Under a 3-bit (W3/A16) configuration, it attains a perplexity of 6.10 on LLaMA-2-7B, outperforming conventional MAC-intensive baselines such as AWQ without the need for asymmetric scaling. It also preserves competitive accuracy in 4-bit settings. At the silicon level, standard-cell RTL synthesis conducted at a 28nm node demonstrates that GoQuant successfully alleviates timing bottlenecks linked to dense multiplier trees. By minimizing combinational logic depth, the proposed parallel shift-and-add datapath lowers the critical path delay to 0.35 ns.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...