Diffusing in the Right Space: A Systematic Study of Latent Diffusability
Title: Optimizing the Diffusion Environment: A Comprehensive Analysis of Latent Diffusability
Abstract:
Latent diffusion models rely on visual tokenizers to compress images into latent spaces, enabling efficient generative modeling. However, high reconstruction fidelity in a tokenizer does not guarantee superior generation performance, indicating that latent representations must be assessed not just for accuracy, but for their "diffusability." While recent research has offered various rationales for diffusion-friendly latent spaces—citing factors such as semantic separability, affine equivariance, distribution uniformity, spatial structure, spectral smoothness, and manifold continuity—these insights are frequently validated using a narrow range of tokenizers. This limitation raises questions about which factors most strongly predict downstream generation quality and whether these findings apply outside the specific contexts in which they were originally identified.
To address these gaps, this study performs a systematic investigation into latent diffusability. We train a broad array of tokenizers featuring varied regularization techniques, architectures, and latent configurations, then evaluate their performance using multiple downstream diffusion backbones. Our analysis highlights several latent properties that consistently correlate with generation quality and demonstrate robust generalization across different experimental conditions. Furthermore, we propose Velocity Irreducible Variance (VIV), a novel metric that quantifies velocity ambiguity caused by trajectory crossings. Extensive experimental results confirm that VIV serves as one of the most reliable predictors of generation quality.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





