LASER: Loss-Aware Singular-value Decomposition and Rank Allocation for Efficient Low-Precision Vision-Language Models
Title: LASER: Loss-Aware Singular-value Decomposition and Rank Allocation for Efficient Low-Precision Vision-Language Models
Abstract: While Vision-Language Models (VLMs) offer robust multimodal reasoning, their substantial computational demands and extensive parameter counts pose significant barriers to deployment on devices with limited resources. Although low-rank decomposition is a leading compression strategy, current approaches frequently prioritize local matrix reconstruction, employ uniform or heuristic rank distribution, and concentrate primarily on attention mechanisms, often neglecting feed-forward networks (FFNs). To address these limitations, we introduce LASER (Loss-Aware Singular-value dEcomposition and Rank allocation), a framework designed to facilitate efficient, low-precision inference in VLMs. LASER formulates a curvature-weighted Singular Value Decomposition (SVD) objective derived from a second-order approximation of the model’s loss, leveraging Kronecker-factored Fisher information to direct the decomposition process toward enhancing downstream task performance rather than merely minimizing reconstruction error. Additionally, we propose a loss-aware, cross-layer rank allocation mechanism driven by calibration gradients, which allows for more strategic distribution of the parameter budget across network layers. We further adapt low-rank compression to FFN layers via a hybrid approach that integrates SVD with quantization. Our experimental results demonstrate that LASER delivers a decoding speedup exceeding $2.3\times$ compared to prior methods, while maintaining high accuracy during low-precision inference.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





