arXiv

Recover-LoRA for Aggressive Quantization: Reclaiming Accuracy in 2-Bit Language Models via Low-Rank Adaptation with Knowledge Distillation on Synthetic Data

Title: Recover-LoRA for Aggressive Quantization: Reclaiming Accuracy in 2-Bit Language Models via Low-Rank Adaptation with Knowledge Distillation on Synthetic Data

Abstract:

Deploying large language models (LLMs) on edge devices or on-device environments is often hindered by strict limitations on memory capacity and bandwidth. While reducing weight precision to 2 bits can significantly boost inference throughput and reduce memory usage, it usually leads to substantial drops in model accuracy. This study adapts Recover-LoRA, a lightweight, data-free technique originally designed to mitigate general weight corruption, to the specific challenge of ultra-low-bit quantization.

We introduce a selective mixed-precision approach named GateUp, where only the gate and up projection layers within the Multi-Layer Perceptron (MLP) are compressed to 2-bit (W2), while all other linear layers retain higher precision. Through roofline analysis across three model architectures ranging from 4B to 20B parameters and two distinct hardware platforms, we show that this W4/W2-GateUp configuration offers a 7.5% to 23.3% increase in tokens per second (TPS) compared to uniform 4-bit (W4) deployment, depending on the specific model and context length. This strategy effectively isolates quantization errors to a predictable set of layers.

To address the accuracy loss associated with 2-bit quantization of these specific layers, we employ Recover-LoRA. This method trains low-rank adapters on the quantized components using logit distillation generated from synthetic data. In a detailed case study involving Qwen3-4B, Recover-LoRA restored 80% to 95% of the accuracy across nine out of twelve benchmarks, utilizing just 10,000 synthetic training samples and requiring no labeled data. Furthermore, our findings indicate that synthetic data yields distillation-based recovery results comparable to those achieved with curated labeled datasets, and that the recovery capabilities extend effectively to out-of-distribution evaluation tasks. These results position Recover-LoRA as a viable, practical solution for post-quantization accuracy recovery in scenarios demanding aggressive weight compression.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

AI Concentration Risk Is the Problem: 3-Minutes MLIV
Bloomberg

AI Concentration Risk Is the Problem: 3-Minutes MLIV

The article argues that AI concentration risk, rather than the technology itself, is the primary concern. It highlights ...

Reuters

Foxconn announces strategic collaboration with Intel on next-gen AI infrastructure

Foxconn and Intel announced a strategic partnership to develop next-generation AI infrastructure. This collaboration aim...

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Reuters

Europe's tech 'liberation day'? Computer says not yet

Europe’s expected tech breakthrough remains unrealized, as current systems indicate that a true "liberation day" has not...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.