arXiv

Lookahead Sample Reward Guidance for Test-Time Scaling of Diffusion Models

June 2, 2026 · Yeongmin Kim, Donghyeok Shin, Byeonghu Na, Minsang Park, Richard Lee Kim, Il-Chul Moon · Original Source

Title: LiDAR Sampling: Efficient Test-Time Scaling for Diffusion Models via Lookahead Reward Guidance

Abstract: While diffusion models exhibit robust generative capabilities, their outputs frequently struggle to fully capture human intent. This study introduces an efficient test-time scaling technique designed to sample from regions characterized by higher human-aligned reward values. Current methods for estimating the Expected Future Reward (EFR) are hindered by significant drawbacks: backward rollouts demand excessive sampling costs, whereas Tweedie-based techniques—such as Sequential Monte Carlo and gradient guidance—are plagued by bias and fundamental sampling difficulties. We demonstrate that EFR at any timestep $\mathbf{x}_t$ can be derived exclusively from marginal samples of a pre-trained diffusion model, allowing for closed-form reward guidance that bypasses the need for neural backpropagation. To enhance computational efficiency further, we propose a few-step lookahead sampling mechanism paired with a precise solver that directs particles toward high-reward lookahead samples. We term this approach LiDAR sampling. LiDAR matches the GenEval performance of the most recent gradient guidance method for SDXL while delivering a 9.5x increase in speed. The source code is available at https://github.com/aailab-kaist/Diffusion-LiDAR-Sampling.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC