RULER: Representation-Level Verification of Machine Unlearning
Title: RULER: Representation-Level Verification of Machine Unlearning
Abstract
The goal of machine unlearning is to eliminate the impact of particular training data points from an existing model, bypassing the need for complete retraining. Existing verification standards rely on output-level assessments, such as membership inference, retention accuracy, and accuracy on the forget set. However, a model can meet all these criteria while still retaining encoded information about the forgotten records within its intermediate layers. To address this gap, we present RULER, a framework comprising representation-level verification metrics.
RULER includes two primary metrics: M2 and M4. M2 is an oracle-comparative measure that evaluates whether records from the forget set reside in the same representational space as they would in a model retrained without that data. Conversely, M4 is an oracle-free metric that identifies residuals by analyzing the unlearned model’s internal similarity structure, requiring no retraining.
We evaluated four approximate unlearning methods, all of which successfully passed output-level evaluations. However, when analyzed using a linear mixed-effects model, M2 revealed significant residuals in 10 out of 12 conditions (p<0.05), with the magnitude of these effects increasing alongside the forget fraction. A fifth approach, Bad Teacher, exhibited similar residuals despite employing a distinct forgetting mechanism. Furthermore, M4 serves as a diagnostic tool prior to unlearning across various domains, including tabular data, images, clinical text, and face identities. In face recognition tasks, M4 uncovered identity-level memorization, highlighting scenarios where none of the tested methods completely eradicated the signal.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




