ReLoRA: Knowledge-Reusing Adaptation for Fast Rollout of Evolving LLM Services
Title: ReLoRA: Efficiently Reusing Knowledge for Rapid Deployment of Evolving Large Language Model Services
Abstract
Large Language Models (LLMs) are increasingly utilized as dynamic services that undergo continuous evolution. Consequently, frequent updates to the foundational model often render previously deployed task-specific Low-Rank Adaptation (LoRA) adapters obsolete. For service providers overseeing a vast portfolio of downstream model services, the standard practice of retraining each LoRA adapter from scratch following every base model update is computationally expensive and significantly hinders service rollout timelines. Conversely, the more straightforward approach of simply applying the original LoRA adapter to the newly updated base model frequently results in diminished service quality, primarily due to incompatibility between the adapter and the new backbone.
To resolve this challenge, we introduce ReLoRA, a framework designed for knowledge-reusing re-adaptation. This approach efficiently restores LoRA adapters to a service-ready state for evolving LLMs, ensuring that task performance is either maintained or enhanced. ReLoRA relies on two primary optimization stages:
- Adaptive LoRA Initialization: This step employs Bayesian optimization to establish a starting point that is aware of compatibility issues. It achieves this by integrating data from the previously deployed task adapter with information regarding the evolution of the base model.
- Scheduled Regularization Fine-Tuning: The process begins with strong regularization to rapidly guide the adapter toward a high-quality solution space. This is followed by a phase of relaxed regularization to allow for precise, task-specific refinement.
This strategic design facilitates the swift recovery of service quality while minimizing the overhead associated with re-adaptation. Our extensive experimental results indicate that, relative to baseline methods, ReLoRA accelerates time-to-readiness by as much as 8.9$\times$ and boosts accuracy by up to 4.6%.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



