Global News Digest

arXiv

Diffusion Image Generation with Explicit Modeling of Data Manifold Geometry

Title: Diffusion Image Generation with Explicit Modeling of Data Manifold Geometry

Abstract

Generative models for images strive to draw samples from the intrinsic data manifold, a process necessitating the mastery of a dense, compact, and low-dimensional parameter space. To address this challenge, we introduce the Data Manifold-aware Image diffusioN moDel (MIND). This novel framework explicitly captures manifold geometry by embedding discrete patch tokenization directly into the score function of a continuous diffusion model. By doing so, it effectively combines the structural quantification strengths of discrete tokens with the parallel generation versatility inherent to continuous diffusion processes.

Our approach facilitates end-to-end differentiable training through a newly developed soft top-$k$ aggregation mechanism. Additionally, we incorporate dual-branch high-frequency feature embedding layers to mitigate the spectral bias typically exhibited by transformer backbones when processing low-dimensional inputs. For the inference phase, we have designed a multi-stage transition sampling scheme that dynamically modulates the sampling strategy according to the specific timestep.

We evaluated MIND extensively using ImageNet at a resolution of 256$\times$256, demonstrating its efficacy. Following an 80-epoch training period, our base model attained an Fréchet Inception Distance (FID) of 22.73 in an unguided setting. This performance nearly halves the 43.47 FID recorded by the standard DiT-B/2 baseline. Compared against baseline models, our method yielded average FID reductions of 15.95 over DiT and 9.06 over SiT.

In guided image generation tasks on ImageNet-256$\times$256, the proposed MIND-B, which comprises only 130M parameters, achieved an FID of 2.06, outperforming LlamaGen-3B, which utilizes 3.1B parameters. Furthermore, our larger MIND-XL variant, containing 715M parameters, pushed the FID down to 1.95. MIND offers a new perspective on diffusion-based image synthesis, laying the groundwork for subsequent advancements in the field. The associated code will be made publicly available.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.