Initialization is Half the Battle: Generating Diverse Images from a Guidance Potential Posterior
Title: Initialization is Half the Battle: Generating Diverse Images from a Guidance Potential Posterior
Abstract:
Although generative models have achieved impressive fidelity, they are often plagued by mode collapse. Current approaches to improving diversity largely concentrate on intervening within the generation trajectory. We argue that this overlooks a critical flaw: standard Gaussian initialization frequently drives trajectories toward dominant modes because it remains indifferent to the structure of the guidance potential landscape. To address this, we propose a method that selects initial noise from a guidance potential posterior, effectively re-weighting the prior to favor regions rich in diversity. For efficient sampling from this distribution, we introduce Diversity-inducing Initialization (DivIn). This technique employs Langevin dynamics to actively traverse the initialization landscape, directing initial noise away from collapse-prone areas while anchoring it to the valid data manifold. DivIn functions as an inference-time enhancement compatible with both flow matching and diffusion models. Our extensive experiments demonstrate that DivIn outperforms existing methods in both text-to-image and class-to-image tasks. Moreover, we show that because DivIn is orthogonal to trajectory-based techniques, integrating the two significantly pushes the diversity-quality Pareto frontier further than either approach could achieve on its own.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




