arXiv

Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data

June 3, 2026 · Ahmed Mehdi Inane, Vincent Quirion, Gintare Karolina Dziugaite, Ioannis Mitliagkas · Original Source

Title: Enhancing the Unlearning-Utility Balance via Asymmetric Sources and Public Data

Abstract:

Current noise-based certified machine unlearning methods encounter a significant limitation: the substantial noise levels necessary to verify unlearning often severely degrade model utility, especially when handling large-scale data deletions. Although incorporating public data is a well-established strategy in differential privacy to alleviate this conflict, its application to unlearning has not yet been investigated. To bridge this gap, we present Asymmetric Langevin Unlearning (ALU), a novel framework that utilizes public data to reduce privacy-related costs. We demonstrate that injecting public data reduces the unlearning cost by a factor of $O(1/n_{\mathrm{pub}}^2)$, thereby offering a distinct computational benefit compared to full retraining. This finding introduces a new control lever: by increasing the amount of public data available, practitioners can lower the required noise level and consequently minimize utility loss. Furthermore, we examine the practical scenario of distribution mismatch, detailing how discrepancies between public and private data sources affect performance. Our results indicate that ALU facilitates the mass unlearning of constant dataset fractions—a scenario where traditional symmetric approaches are inefficient—while sustaining high utility. Empirical tests, utilizing membership inference attacks and variational R\'enyi divergence, validate that ALU successfully prevents privacy breaches while maintaining model effectiveness even under moderate distribution shifts.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC