Canonicalized Stable-List Replay for Private Federated Continual Learning over Language-Model Embeddings
Title: Canonicalized Stable-List Replay for Private Federated Continual Learning over Language-Model Embeddings
Abstract:
Federated continual learning (FCL) enables distributed clients to adapt language-model heads to evolving NLP tasks without exchanging raw text data. However, when user-level differential privacy (DP) is applied, replay-based continual learning encounters a structural barrier: clients are restricted to releasing only small, noisy lists of candidate replay summaries, which remain unordered across different clients. To address this, we propose Canonicalized Stable-List Replay (CSLR). In this framework, clients privately generate candidate replay distributions within a shared sentence-embedding space, and the server aligns these distributions using signatures derived from public anchor sentences. Crucially, these anchors serve to provide identifiability for aggregation purposes rather than functioning as additional replay data. We demonstrate that, provided an observable anchor-signature margin exists, $O(\log(N/\eta)/p)$ anchors are sufficient to distinguish $N$ candidate list elements with a probability of at least $1-\eta$. Additionally, we present a scoped result demonstrating non-identifiability for unordered-label oracle models in the absence of anchors.
Empirical evaluations across five seeds on continual classification, Named Entity Recognition (NER), and dialogue benchmarks reveal that CSLR enhances the final average task metric by 3.9 to 5.6 points compared to the most effective non-CSLR DP baseline at $\eps=4$, given the specified replay-release budget. Furthermore, CSLR surpasses both Hungarian and optimal-transport matchers. The formal privacy guarantee encompasses the replay release process; however, achieving end-to-end private training necessitates composing this with a private optimizer for updates to the task heads.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





