GUDA: Counterfactual Group-wise Training Data Attribution for Diffusion Models via Unlearning
Title: GUDA: Counterfactual Group-wise Training Data Attribution for Diffusion Models via Unlearning
Original: arXiv:2601.22651v2 Announce Type: replace-cross
Abstract: The objective of training-data attribution in vision generative models is to pinpoint the specific training examples that shaped a particular output. Although existing techniques typically evaluate individual instances, users frequently require insights at the group level, such as specific artistic styles or object categories. This group-wise attribution operates on a counterfactual premise: it assesses how a model’s generation would alter if an entire group were excluded from the training set. A direct implementation of this counterfactual involves Leave-One-Group-Out (LOGO) retraining, where the model is retrained multiple times, each time omitting a different group. However, this process becomes computationally unmanageable as the number of groups increases. To address this, we introduce GUDA (Group Unlearning-based Data Attribution), a method tailored for diffusion models. Rather than training new models from scratch, GUDA approximates each counterfactual scenario by applying machine unlearning to a single model trained on the full dataset. The method measures group influence by calculating the disparity in a likelihood-based scoring metric (ELBO) between the original full-data model and the unlearned counterfactuals. Our experiments, conducted on CIFAR-10 and artistic style attribution using Stable Diffusion, demonstrate that GUDA more accurately identifies the primary contributing groups compared to semantic similarity, gradient-based attribution, and instance-level unlearning methods. Furthermore, GUDA delivers a speedup of approximately 100 times on CIFAR-10 when compared to LOGO retraining.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




