Coarse-to-Fine Compositional Diffusion for Long-Horizon Planning
Title: Coarse-to-Fine Compositional Diffusion for Long-Horizon Planning
Abstract:
While diffusion models offer robust priors for generating structured data, numerous applications demand outputs that exceed the scale of standard training regimes. Compositional generation tackles this challenge by stitching together overlapping local plans derived from a pretrained short-horizon prior to construct a long-horizon result. Nevertheless, conventional composition methods mainly ensure agreement between adjacent local plans, securing local consistency without explicitly defining the global architecture of the entire sequence. Consequently, locally compatible segments can still assemble into implausible trajectories, task sequences, or temporal progressions. Although existing approaches attempt to enhance global coherence through iterative propagation of local consistency signals or inference-time optimization, these techniques incur significant computational costs as the volume or dimensionality of local plans grows.
To address this, we introduce Coarse-to-Fine Compositional Diffusion (CoFi), an inference-time sampling strategy that decouples the establishment of global structure from the refinement of local details. CoFi initially aligns local denoised estimates around a unified coarse framework, creating a global scaffold that encapsulates long-range, task-level arrangements. Subsequently, it diffuses this scaffold to an intermediate noise stage and employs the original pretrained local prior to denoise it. This process restores fine-grained local structures while maintaining the global coherence enforced by the scaffold. In evaluations spanning long-horizon robotic planning, panoramic image synthesis, and long-form video generation, CoFi surpasses previous compositional baselines in both global coherence and local sample quality, all while reducing the number of denoiser evaluations by a factor of 2 to 8.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





