arXiv

Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

Title: Flash-GRPO: Streamlining Video Diffusion Alignment Through Single-Step Policy Optimization

Abstract: Group Relative Policy Optimization (GRPO) has become a cornerstone for aligning video diffusion models with human preferences, yet it is hindered by a significant computational bottleneck: training a model with 14 billion parameters generally requires hundreds of GPU days for each experiment. While current efficiency strategies attempt to lower costs by employing sliding window subsampling of training timesteps, these approaches fundamentally degrade optimization quality, leading to severe instability and an inability to achieve optimal trajectory performance. To address these limitations, we introduce Flash-GRPO, a single-step training framework that delivers superior alignment quality compared to full trajectory training, even within constrained computational budgets, while significantly boosting training efficiency. Flash-GRPO tackles two primary challenges: first, iso-temporal grouping removes variance confounded by timesteps by enforcing prompt-wise temporal consistency, thereby decoupling policy performance from the inherent difficulty of specific timesteps; second, temporal gradient rectification counteracts the time-dependent scaling factor responsible for wildly inconsistent gradient magnitudes across different timesteps. Validated through experiments on models ranging from 1.3B to 14B parameters, Flash-GRPO demonstrates consistent stability, substantial acceleration in training, and state-of-the-art alignment quality.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion
Bloomberg

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion

Nvidia-backed robotics startup Generalist AI has reached a $2 billion valuation. Founders Pete Florence, Andy Zeng, and ...

TechCrunch

Oura Ring 5 review: Thinner, lighter, better

The Oura Ring 5 is 40% smaller and lighter than its predecessor, offering superior comfort and a discreet, jewelry-like ...

Financial Times

How AI has de-skilled translation

AI fragments specialist translation into routine tasks, effectively de-skilling the profession. This shift reduces compl...

Zurich Insurance Expands Data-Center Offering Beyond the US
Bloomberg

Zurich Insurance Expands Data-Center Offering Beyond the US

Zurich Insurance Group is expanding its data center insurance products internationally, extending coverage beyond the Un...

Emerging-Market Stocks Fall as Broadcom Miss Disrupts AI Trade
Bloomberg

Emerging-Market Stocks Fall as Broadcom Miss Disrupts AI Trade

Broadcom’s earnings miss triggered a sell-off in AI stocks, dragging down emerging-market equities. This disruption high...

Revolut Co-Founder, CTO Vlad Yatsenko to Step Down From Role
Bloomberg

Revolut Co-Founder, CTO Vlad Yatsenko to Step Down From Role

Revolut co-founder and CTO Vlad Yatsenko is stepping down from his executive role. The resignation marks a significant l...