arXiv

Compress then Merge: From Multiple LoRAs into One Low-Rank Adapter

June 3, 2026 · Zhengbao He, Ruiqi Ding, Zhehao Huang, Ruikai Yang, Tao Li, Xiaolin Huang · Original Source

Title: Compress then Merge: From Multiple LoRAs into One Low-Rank Adapter

Abstract:

Low-rank adaptation (LoRA) facilitates the efficient, parameter-specific customization of foundation models. However, the growing number of task-specific adapters leads to a fragmentation of capabilities, which hinders both reuse and deployment. This paper investigates the challenge of consolidating $T$ distinct LoRAs into a single rank-$r$ LoRA, thereby maintaining the advantages of low-rank representation. Current approaches typically follow a "Merge-then-Compress" workflow, where adapters are combined in the full parameter space first, followed by compression to rank $r$ using truncated Singular Value Decomposition (SVD). This strategy often fails to preserve the low-rank integrity during the initial merge, making it challenging for the subsequent compression step to recover a high-quality rank-$r$ adapter.

To address this, we introduce Compress-then-Merge (CtM), a novel pipeline that reverses the conventional order by imposing the rank-$r$ constraint prior to merging. CtM identifies shared $r$-dimensional subspaces derived exclusively from LoRA weights to capture common structures across adapters. It then projects each adapter into these shared subspaces to derive $r\times r$ coordinates, applying standard merging techniques within this reduced dimensional space. By design, CtM ensures the output is a rank-$r$ LoRA, eliminating the need for post-hoc truncation, and facilitates efficient computation within the core space defined by the concatenated LoRA factors. Empirical evaluations across various models and tasks demonstrate that CtM consistently surpasses existing baselines that produce a single LoRA, while also closing the performance gap with full-parameter merging methods.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC