arXiv

LLM Anonymization Against Agentic Re-Identification

June 2, 2026 · Ziwen Li, Jianing Wen, Tianshi Li · Original Source

Title: Defending Against Agentic Re-Identification in LLM Anonymization

Abstract: The integration of web search capabilities into agentic LLMs fundamentally shifts the threat landscape for text anonymization. In this new environment, subtle contextual hints may serve as cross-referencable data points for re-identification, even as those same details provide essential analytical value. Current defensive strategies typically focus on stripping explicit identifiers, applying formal privacy perturbations, or validating rewritten content against models that lack web inference capabilities. Consequently, the critical balance between resisting agentic web-search re-identification and maintaining text utility remains largely unexplored. To address this gap, we present AURA (Anonymization with Utility-Retention Adaptation), an LLM-driven mask-reconstruct framework. This approach separates the localization of privacy risks from the reconstruction of utility-preserving content, employing adversarial checks to ensure both privacy strength and utility retention. We assessed AURA using real-user interview transcripts, subjecting the anonymized data to re-identification attacks executed by web-search agents. Utility was measured through interviewee-profile facts, codebook facts, and a joint contextual utility grid. Our findings demonstrate that AURA advances the privacy-utility frontier. It achieves this by leveraging an adaptive privacy scope to bolster defenses against agentic re-identification, while its mask-reconstruct methodology better safeguards contextual utility within a fixed privacy boundary.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC