Repurposing Adversarial Perturbations for Continual Learning: From Defense to Active Alignment
Title: Leveraging Adversarial Perturbations for Continual Learning: Shifting from Defense to Active Alignment
Abstract
In fluid environments, large language models must continuously adapt to emerging tasks. However, continual learning (CL) is frequently hindered by catastrophic forgetting, constrained transferability, and susceptibility to adversarial perturbations. To overcome these challenges, we introduce AdvCL, a framework that repurposes adversarial perturbations as a geometric control signal to facilitate stable and continuous adaptation.
AdvCL integrates three distinct, plug-in modules designed to enhance stability and performance: 1. Intra-Smooth: Encourages local smoothness through the application of minor adversarial perturbations. 2. Proto-Clip: Employs similarity clipping to prevent the model from over-aligning with the prototype of the current task. 3. Inter-Align: Directs alignment toward the prototype of previous tasks, thereby minimizing representational gaps between tasks.
Our experimental results demonstrate that AdvCL yields consistent improvements in both standard accuracy and robustness, while significantly reducing forgetting and enhancing transfer capabilities. We further dissect the underlying mechanisms by quantifying how Intra-Smooth responds to variations in perturbation settings and evaluating the impact of Inter-Align on task similarity and geometric distance. Ultimately, our analysis reveals that while the modules offer complementary benefits when used together, each can also be independently integrated into various CL paradigmsāsuch as replay-based, regularization-based, and dynamic architecture approachesāproviding a versatile geometric control mechanism for continual learning.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




