arXiv

MorphoQuant: Modality-Aware Quantization for Omni-modal Large Language Models

June 4, 2026 · Yue Wu, Changyuan Wang, Zixuan Wang, Shilin Ma, Yansong Tang · Original Source

Title: MorphoQuant: Modality-Aware Quantization for Omni-modal Large Language Models

Abstract:

Post-Training Quantization (PTQ) techniques typically encounter significant challenges when applied to 4-bit Omni-modal Large Language Models (OLLMs). These difficulties arise from the stark heterogeneity in data distributions and the varying outlier patterns inherent across different modalities. To overcome these obstacles, we present MorphoQuant, a specialized PTQ framework designed to maintain cross-modal morphology and reduce the loss of critical outlier information.

Central to our approach is the Distribution-Aware Bias Compensation (DABC) mechanism. DABC functions by selectively integrating long-tailed outliers into channel-wise biases. This strategy effectively protects the magnitude of outliers while allowing for high-precision discretization of dense inlier data, thus ensuring accurate discretization across the varied distribution landscapes of different modalities. Furthermore, we introduce Morphology-Directed Quantization Function Optimization (MDQFO), a technique that co-optimizes the quantization grid alongside the bias mask. This process guarantees fine-grained alignment throughout the model.

We conducted extensive evaluations using the Qwen2.5-Omni model on benchmarks such as Video-MME and MMMU, where our method demonstrated clear superiority. Most notably, our W4A4 configuration achieved a score of 76.63% on ScienceQA. This performance not only significantly exceeds current state-of-the-art W4A4 methods but also surprisingly outperforms the W4A16 baseline. These results highlight the exceptional balance between accuracy and efficiency offered by our framework.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC