Diffusion Models, Denoiser Architecture and Creativity
Title: The Interplay Between Denoiser Architecture and Creativity in Diffusion Models
Abstract:
Creativity in diffusion models is defined by their capacity to produce images that are both highly realistic and distinct from the data used during training. This capability is somewhat counterintuitive, as theoretical frameworks indicate that if the denoiser employed is Bayes optimal for a specific training set, the model will merely replicate those training samples. In this study, we provide both empirical evidence and theoretical analysis demonstrating that the creativity observed in diffusion models stems from the interplay between the denoiser’s architecture and the underlying target distribution.
Theoretically, we derive explicit expressions for the distribution of generated samples, modeling them as a function of the target distribution and the denoiser architecture across three distinct architectural types: linear, polynomial, and bottleneck. On the empirical front, we demonstrate that minor modifications to the widely used UNET denoiser architecture can result in significantly different creative outputs. Notably, these slight adjustments frequently lead to the generation of samples that lack realism. Collectively, our findings indicate that the success of diffusion models depends critically on ensuring that the inductive bias inherent in the denoiser architecture is strongly aligned with the true target distribution.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





