arXiv

dMX: Differentiable Mixed-Precision Assignment for Low-Precision Floating-Point Formats

Title: dMX: A Differentiable Framework for Mixed-Precision Assignment in Low-Precision Floating-Point Formats

Abstract

While quantizing large language models (LLMs) into low-precision floating-point representations is essential for efficient deployment, applying a uniform bit-width across all layers proves sub-optimal regarding both accuracy and performance. To address this, we present dMX, a differentiable mixed-precision quantization framework designed for learnable floating-point bit-width assignment. This study focuses on the application of dMX to the microscaling floating-point (MXFP) data types established by the Open Compute Project (OCP) standard.

In our approach, the per-layer bit-width assignment is treated as a continuous optimization problem. Each layer’s floating-point format is defined by a single scalar parameter, effectively condensing a multi-variate design space into one learnable offset. This allows the offset to assume continuous values during training, thereby preventing the erratic oscillations typically seen between discrete quantization formats. To ensure the final configuration aligns with hardware-compatible MXFP formats without causing abrupt shifts in behavior between training and inference, we employ a temperature-based annealing schedule that progressively discretizes these learned offsets.

Furthermore, a target-aware regularization term guides the average bit-width toward a user-defined budget. This serves as a coarse-grained indicator of inference cost, effectively balancing deployment efficiency with model quality. We evaluated dMX across various LLM families, including Llama, Qwen3, and SmolLM2, measuring perplexity on WikiText-2 and accuracy across four zero-shot reasoning benchmarks. The results demonstrate that dMX consistently produces Pareto-dominating models, outperforming Kullback-Leibler (KL) divergence-based layer-selection heuristics and efficiently managing the trade-offs between model fidelity and average bit-width.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

AI Concentration Risk Is the Problem: 3-Minutes MLIV
Bloomberg

AI Concentration Risk Is the Problem: 3-Minutes MLIV

The article argues that AI concentration risk, rather than the technology itself, is the primary concern. It highlights ...

Reuters

Foxconn announces strategic collaboration with Intel on next-gen AI infrastructure

Foxconn and Intel announced a strategic partnership to develop next-generation AI infrastructure. This collaboration aim...

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Reuters

Europe's tech 'liberation day'? Computer says not yet

Europe’s expected tech breakthrough remains unrealized, as current systems indicate that a true "liberation day" has not...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.