arXiv

Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation

Title: Causal Forcing++: Enabling Real-Time Interactive Video Generation via Scalable Few-Step Autoregressive Diffusion Distillation

Abstract:

The demand for real-time, interactive video generation necessitates systems that offer low-latency, streaming capabilities, and controllable rollout. While current autoregressive (AR) diffusion distillation techniques have demonstrated impressive performance in chunk-wise scenarios requiring four steps, they are constrained by coarse response granularity and significant sampling delays. This work investigates a more demanding framework: frame-wise autoregression utilizing merely 1–2 sampling steps. Within this regime, we pinpoint the initialization of the few-step AR student as the primary bottleneck. Previous initialization strategies prove inadequate, as they are either misaligned with the target, unable to support few-step generation, or prohibitively expensive to scale.

To address these challenges, we introduce Causal Forcing++, a scalable and principled pipeline that leverages causal consistency distillation (causal CD) for initializing few-step AR models. The fundamental concept behind causal CD is that it acquires the same AR-conditional flow map as causal ODE distillation. However, it secures supervision from a single online teacher ODE step occurring between adjacent timesteps. This approach eliminates the requirement to precompute and store complete PF-ODE trajectories, rendering the initialization process both more efficient and simpler to optimize.

Our proposed pipeline, \ours, outperforms the state-of-the-art 4-step chunk-wise Causal Forcing model under the frame-wise 2-step setting. Specifically, it achieves improvements of 0.1 in VBench Total, 0.3 in VBench Quality, and 0.335 in VisionReward. Additionally, it cuts first-frame latency by 50% and reduces Stage 2 training costs by approximately fourfold ($\sim$$4\times$). Furthermore, we demonstrate the versatility of this pipeline by extending it to action-conditioned world model generation, following the approach of Genie3.

Project Page: https://github.com/thu-ml/Causal-Forcing and https://github.com/shengshu-ai/minWM .


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Withings Debuts New Smart Scale Marketed Toward GLP-1 Users
Bloomberg

Withings Debuts New Smart Scale Marketed Toward GLP-1 Users

Withings launched a new smart scale targeting GLP-1 users, offering advanced body composition analysis. This device help...

TechCrunch

Rocket engine startup Impulse raises $500 million to hire people, not AI

Rocket engine startup Impulse Space raised $500 million to hire 200 engineers, prioritizing human expertise over AI for ...

Startup Impulse Space Raises $500 Million, Valued at $4 Billion
Bloomberg

Startup Impulse Space Raises $500 Million, Valued at $4 Billion

Impulse Space secured $500 million in funding, achieving a $4 billion valuation. This investment supports the developmen...

Walmart’s Answer to Apple Pay Wants to Be Your Favorite Financial App
Bloomberg

Walmart’s Answer to Apple Pay Wants to Be Your Favorite Financial App

Walmart’s new financial app aims to rival Apple Pay, positioning itself as a preferred digital payment and banking solut...

Nvidia Is Bigger, Stronger, and Trying to Slay the Laptop Dragon Again
Bloomberg

Nvidia Is Bigger, Stronger, and Trying to Slay the Laptop Dragon Again

Nvidia unveiled the RTX Spark Superchip at Computex 2026, aiming to challenge Intel’s PC dominance and modernize hardwar...

TechCrunch

Pacific Fusion’s latest prototype packs 440 gigawatts into an 80-nanosecond burst

Pacific Fusion’s new prototype delivers 440 gigawatts in 80 nanoseconds, securing over $1 billion in funding and enablin...