arXiv

Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs

Title: Unlearning Leaves a Mark: Identifying Unlearning Signatures in LLMs via Model Outputs

Abstract:

The process of machine unlearning (MU) in large language models (LLMs)—often termed LLM unlearning—aims to excise specific unwanted data or knowledge from a trained system without degrading its efficacy on conventional tasks. Although unlearning is essential for safeguarding data privacy, upholding copyright laws, and reducing sociotechnical risks associated with LLMs, our research highlights a previously overlooked vulnerability that emerges after the unlearning process: the detectability of unlearning traces.

We have found that unlearning creates enduring "fingerprints" within LLMs. These traces are visible in both the model’s internal representations and its behavioral outputs. Notably, these signatures can be identified through response outputs, even when the model is presented with inputs unrelated to the forgotten data. Specifically, a basic supervised classifier can accurately determine if a model has undergone unlearning by analyzing merely its prediction logits or its textual responses.

Further investigation reveals that these traces reside in intermediate activations and propagate nonlinearly to the final layer, creating low-dimensional, learnable manifolds within the activation space. Our extensive experiments confirm that unlearning traces can be detected with greater than 90% accuracy, even when using forget-irrelevant inputs. Additionally, we observe that larger LLMs display more pronounced detectability. These results indicate that unlearning generates measurable signatures, thereby introducing a novel risk: if a model is identified as having undergone unlearning, it may be vulnerable to reverse-engineering attempts to recover the forgotten information based on a given input query.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...