Anatomy of Unlearning: The Dual Impact of Fact Salience and Model Fine-Tuning
Title: Deconstructing Unlearning: The Interplay of Fact Prominence and Model Fine-Tuning
Abstract: Machine Unlearning (MU) allows Large Language Models (LLMs) to excise hazardous or obsolete data. Yet, current research largely overlooks the origin of this knowledge—whether it stems from pretraining or supervised fine-tuning (SFT)—and operates under the flawed premise that all facts are equally susceptible to forgetting. To address this gap, we present DUET (Dual Unlearning Evaluation across Training Stages), a benchmark comprising 28,600 triplets derived from Wikidata. These entries are annotated with fact popularity metrics, utilizing both Wikipedia link frequencies and LLM-generated salience scores. Our experimental results reveal distinct responses to unlearning protocols between pretrained and SFT models. Specifically, applying an SFT step to the target forget data facilitates smoother forgetting, enhances tuning stability, and improves retention rates by 10% to 50%. In contrast, performing direct unlearning on pretrained models proves unstable, often leading to catastrophic forgetting or the unintended relearning of the very data intended to be removed.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





