GoQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization
Title: GoQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization
Abstract:
The integration of Large Language Models (LLMs) and Vision Transformers (ViTs) into edge computing environments faces substantial hurdles due to memory constraints and the severe timing bottlenecks caused by dense Multiply-Accumulate (MAC) arrays. In the ultra-low bit domain, logarithmic Power-of-Two (PoT) quantization emerges as a hardware-efficient solution by substituting MAC operations with bit-shifts. Nevertheless, the non-uniform exponential lattice is fundamentally restricted by a Low Angular Resolution Regime. This structural deficiency is especially acute at sub-4-bit levels, resulting in significant degradation of high-dimensional feature manifolds.
To overcome this geometric constraint, we introduce Geometric Orthogonal Residual Projection Quantization (GoQuant), a framework that integrates algorithmic and hardware design. GoQuant treats quantization as a dual-basis geometric projection, adaptively constructing a higher-resolution residual lattice exclusively through shift-and-add operations. Additionally, its analytical solver serves as a practical substitute for resource-heavy gradient-based optimization, slashing the full-model calibration time for LLaMA-2-7B to roughly 15 minutes.
Comprehensive evaluations highlight GoQuant’s versatility across different modalities and its hardware efficiency. Under a 3-bit (W3/A16) configuration, it attains a perplexity of 6.10 on LLaMA-2-7B, outperforming conventional MAC-intensive baselines such as AWQ without the need for asymmetric scaling. It also preserves competitive accuracy in 4-bit settings. At the silicon level, standard-cell RTL synthesis conducted at a 28nm node demonstrates that GoQuant successfully alleviates timing bottlenecks linked to dense multiplier trees. By minimizing combinational logic depth, the proposed parallel shift-and-add datapath lowers the critical path delay to 0.35 ns.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





