Gradient Preconditioning for Efficient and Reliable Reward-Guided Generation
Title: Gradient Preconditioning for Efficient and Reliable Reward-Guided Generation
Abstract:
This paper introduces a gradient preconditioning technique designed to enhance both the efficiency and reliability of reward-guided generation using one-step generative models. While optimizing noise at test time can significantly improve the quality of outputs from pretrained models, this approach is often hindered by "reward hacking"—a phenomenon that compromises output quality—and by excessive computational costs that limit practical applicability. To address these issues, we precondition reward gradients by projecting them onto a specifically constructed feasible set of white Gaussian noise. This set is defined by blockwise norm constraints and forms a compact spectral structure that accurately reflects the statistical properties and spatial uncorrelatedness inherent in white Gaussian noise. By aligning gradient updates with the noise direction, this method accelerates and strengthens reward maximization while simultaneously mitigating the risk of reward hacking. The proposed projection admits a closed-form solution and maintains a computational complexity of $O(N \log N)$, equivalent to Fast Fourier Transform (FFT), thereby introducing minimal overhead. Empirical evaluations on the FLUX model across four distinct reward models demonstrate that our method achieves aesthetic scores comparable to those of the current state-of-the-art regularization-based techniques, but with only 30% of the required wall-clock time.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




