arXiv

You Don't Need All That Attention: Surgical Memorization Mitigation in Text-to-Image Diffusion Models

June 2, 2026 · Kairan Zhao, Eleni Triantafillou, Peter Triantafillou · Original Source

Title: You Don't Need All That Attention: Surgical Memorization Mitigation in Text-to-Image Diffusion Models

Original: arXiv:2603.00133v2 Announce Type: replace-cross Abstract: Generative models have been shown to "memorize" certain training data, leading to verbatim or near-verbatim generating images, which may cause privacy concerns or copyright infringement. We introduce Guidance Using Attractive-Repulsive Dynamics (GUARD), a novel framework for memorization mitigation in text-to-image diffusion models. GUARD adjusts the image denoising process to guide the generation away from an original training image and towards one that is distinct from training data while remaining aligned with the prompt, guarding against reproducing training data, without hurting image generation quality. We propose a concrete instantiation of this framework, where the positive target that we steer towards is given by a novel method for (cross) attention attenuation based on (i) a novel statistical mechanism that automatically identifies the prompt positions where cross attention must be attenuated and (ii) attenuating cross-attention in these per-prompt locations. The resulting GUARD offers a surgical, dynamic per-prompt inference-time approach that, we find, is by far the most robust method in terms of consistently producing state-of-the-art results for memorization mitigation across two architectures and for both verbatim and template memorization, while also improving upon or yielding comparable results in terms of image quality.

Rewritten: Title: You Don't Need All That Attention: Surgical Memorization Mitigation in Text-to-Image Diffusion Models

Abstract: The tendency of generative models to "memorize" specific segments of their training sets can result in the production of images that are exact or nearly exact copies of source material, raising significant issues regarding copyright violations and privacy breaches. To address this, we present Guidance Using Attractive-Repulsive Dynamics (GUARD), a new framework designed to mitigate memorization within text-to-image diffusion models. GUARD modifies the denoising phase of image generation to steer outputs away from existing training images toward novel, distinct visuals that still adhere to the user's prompt. This process prevents the replication of training data while maintaining high image generation quality. We provide a specific implementation of this framework, utilizing a new technique for attenuating cross-attention to define the positive target for generation. This technique relies on (i) a statistical method that automatically detects which prompt positions require cross-attention reduction, and (ii) the actual attenuation of cross-attention at these identified locations. Our findings indicate that GUARD represents a highly robust, dynamic, per-prompt approach applied at inference time. It consistently achieves state-of-the-art performance in mitigating both template and verbatim memorization across two different model architectures. Furthermore, GUARD either enhances or matches the quality of the generated images compared to existing methods.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC