LiftQuant: Continuous Bit-Width LLM via Dimensional Lifting and Projection
Title: LiftQuant: Enabling Continuous Bit-Width in Large Language Models Through Dimensional Lifting and Projection
Abstract: Current quantization techniques are constrained by fixed, integer-based bit-widths—such as 2-bit or 3-bit configurations—which creates a "deployment gap" by preventing Large Language Models (LLMs) from being optimally aligned with specific memory constraints. To address this limitation, we present LiftQuant, a new framework that facilitates continuous bit-width management for achieving Pareto-optimal deployment performance. The framework’s central innovation is a "lift-then-project" strategy that approximates low-dimensional weight vectors by projecting a simple 1-bit lattice from a higher-dimensional "lifted" space. Importantly, the resulting effective bit-width is calculated as the ratio between the lifted dimension and the original dimension. Since dimension serves as a flexible structural parameter, this approach allows for quasi-continuous tuning of the bit-width. This projection process yields a structured yet non-uniform codebook, thereby harnessing the expressive capabilities of Vector Quantization (VQ). Although it offers advantages over VQ, LiftQuant’s decoding process depends exclusively on linear transformations and 1-bit uniform quantizers, preserving its hardware efficiency. This adaptability is transformative; for instance, LiftQuant allows a 70B parameter LLM to be compressed to 2.4 bits, precisely matching a 24GB GPU capacity, while significantly outperforming state-of-the-art 2-bit models configured for the same hardware. Our code and checkpoints are available at https://github.com/Heliulu/LiftQuant.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




