arXiv

Are we really tilting? The mechanics of reward guidance in flow and diffusion models

Title: Is the Tilt Genuine? Unraveling the Mechanics of Reward Guidance in Flow and Diffusion Models

Original: arXiv:2606.02884v1 Announce Type: cross Abstract: Reward guidance algorithms steer a learned generative process toward the reward-tilted measure at inference time. While empirically powerful, these methods are prone to reward hacking: the guided model over-optimizes the reward at the cost of fidelity to the learned distribution. Prior work has attributed this to the complexity of neural reward functions or implicit biases in diffusion training, but its fundamental origins remain poorly understood. We show that reward hacking arises from an approximation made in most practical implementations of reward-guided diffusion -- finite-particle plug-in estimation of the Doob h-function -- even in the simplest non-trivial settings of Gaussian and Gaussian mixture targets with quadratic rewards. In closed form, we isolate two distinct failure modes of the plug-in estimator: it leads to reward hacking within each mode and it cannot select high-reward modes. We propose a closed-form reward damping schedule that corrects the within-mode bias with no additional compute, and clarify the role of best-of-n sampling in compensating for the mode selection failure. Experiments on Gaussian mixture targets, a 2D checkerboard, and FLUX.1 text-to-image generation confirm that our theoretical insights carry over to practical settings.

Rewritten: arXiv:2606.02884v1 Announce Type: cross Abstract: At inference time, reward guidance algorithms direct generative processes toward measures tilted by rewards. Despite their empirical success, these techniques frequently suffer from "reward hacking," where models excessively optimize for reward scores while sacrificing adherence to the underlying learned distribution. Although previous studies have linked this issue to intricate neural reward functions or inherent biases within diffusion training, the root causes have largely remained unclear. This paper demonstrates that reward hacking is actually a consequence of a specific approximation employed in most real-world reward-guided diffusion applications: the finite-particle plug-in estimation of the Doob h-function. This phenomenon persists even in straightforward scenarios involving Gaussian and Gaussian mixture targets paired with quadratic rewards. Through analytical derivation, we identify two primary failure modes of this plug-in estimator: it induces reward hacking within individual modes and fails to identify high-reward modes. To address the within-mode bias, we introduce a closed-form reward damping schedule that requires no extra computational resources. Additionally, we elucidate how best-of-n sampling helps mitigate the failure to select high-reward modes. Our theoretical findings are validated through experiments on Gaussian mixture targets, a 2D checkerboard pattern, and FLUX.1 text-to-image generation, demonstrating that these insights apply to practical applications.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...