arXiv

Coarse-to-Fine Compositional Diffusion for Long-Horizon Planning

Title: Coarse-to-Fine Compositional Diffusion for Long-Horizon Planning

Abstract:

While diffusion models offer robust priors for generating structured data, numerous applications demand outputs that exceed the scale of standard training regimes. Compositional generation tackles this challenge by stitching together overlapping local plans derived from a pretrained short-horizon prior to construct a long-horizon result. Nevertheless, conventional composition methods mainly ensure agreement between adjacent local plans, securing local consistency without explicitly defining the global architecture of the entire sequence. Consequently, locally compatible segments can still assemble into implausible trajectories, task sequences, or temporal progressions. Although existing approaches attempt to enhance global coherence through iterative propagation of local consistency signals or inference-time optimization, these techniques incur significant computational costs as the volume or dimensionality of local plans grows.

To address this, we introduce Coarse-to-Fine Compositional Diffusion (CoFi), an inference-time sampling strategy that decouples the establishment of global structure from the refinement of local details. CoFi initially aligns local denoised estimates around a unified coarse framework, creating a global scaffold that encapsulates long-range, task-level arrangements. Subsequently, it diffuses this scaffold to an intermediate noise stage and employs the original pretrained local prior to denoise it. This process restores fine-grained local structures while maintaining the global coherence enforced by the scaffold. In evaluations spanning long-horizon robotic planning, panoramic image synthesis, and long-form video generation, CoFi surpasses previous compositional baselines in both global coherence and local sample quality, all while reducing the number of denoiser evaluations by a factor of 2 to 8.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...