arXiv

Near-Optimal Pure Machine Unlearning for Smooth Strongly Convex Losses

June 2, 2026 · Matthew Regehr, Gautam Kamath, Andrew Lowy · Original Source

Title: Achieving Near-Optimal Pure Machine Unlearning for Smooth Strongly Convex Loss Functions

Abstract: The drive to implement machine unlearning stems from regulatory mandates and user rights, such as the "right to be forgotten," which necessitate the removal of an individual’s data influence from trained models. While previous research has established algorithms and error bounds for unlearning within the context of smooth, strongly convex stochastic optimization, the underlying statistical costs associated with this process have not been fully understood. This study addresses this gap by establishing both upper and lower bounds on the excess population risk for approximate $\varepsilon$-unlearning, demonstrating that these bounds are tight up to a condition-number factor. In the specific case of mean estimation over the unit ball, our derived upper and lower bounds align perfectly. The optimal performance rate consists of the standard statistical error combined with an unlearning penalty. This penalty transitions between the cost of retraining from scratch and an exponentially smaller value as the ratio of $\varepsilon/d$ increases, with $d$ representing the model's dimension. Specifically, when $\varepsilon$ significantly exceeds $d$, the $\varepsilon$-unlearning algorithm provides an exponential gain in accuracy compared to both differential privacy baselines and retraining models from scratch. Conversely, if $\varepsilon \le d$, retraining from scratch remains the optimal strategy.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC