arXiv

PURGE: Projected Unlearning via Retain-Guided Erasure

June 3, 2026 · Vedant Jawandhia, Daksh Ahuja, Ghufran Alam Siddiqui, Prashant Trivedi, Yash Sinha, Pratik Narang · Original Source

Title: PURGE: Projected Unlearning via Retain-Guided Erasure

Abstract: This paper introduces PURGE, a machine unlearning (MU) method grounded in the premise that continual learning (CL) and MU are essentially dual problems sharing a core tension in opposing directions. While CL aims to acquire new knowledge without discarding previous learning, MU seeks to excise specific data points while preserving performance on remaining data. PURGE capitalizes on this relationship by modifying the gradient projection technique from A-GEM (Chaudhry et al., 2019) to ensure that each unlearning step does not elevate the loss associated with the retained dataset. Beyond this constraint, the algorithm implements multi-layer representation erasure, which drives the activations of the forget-set in intermediate layers toward the distribution of the retain-set. This approach removes information from hidden representations entirely, rather than merely suppressing it at the output stage. A critical innovation in PURGE’s design is the retain-confusion target. Instead of forcing forget-set outputs toward a uniform distribution—a strategy we discovered is surprisingly vulnerable to membership inference attacks (MIA)—the method targets the model’s inherent confusion patterns when processing retain data. Consequently, the resulting unlearned model becomes difficult to differentiate from one trained from scratch. The algorithm employs two self-regulating termination conditions, specifically a retain-loss budget and a forget-accuracy target, which automatically determine when the process should conclude, thereby eliminating the need for manual epoch adjustments. Evaluated across 22 class-level forgetting tasks on five datasets (CIFAR-10, MNIST, SVHN, STL10, and PathMNIST), PURGE consistently maintains retain accuracy above 96% while achieving MIA AUROC scores near 0.5, considered the ideal outcome. These results demonstrate superior performance on the privacy-utility frontier compared to gradient ascent, KL-uniform, and several existing baselines.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC