arXiv

Drifting Preference Optimization for One-Step Generative Models

Title: Drifting Preference Optimization for One-Step Generative Models

Abstract:

Deterministic one-step text-to-image models offer significant deployment advantages due to their ability to produce images via a single forward pass. However, aligning these models through preference fine-tuning presents considerable challenges. Conventional alignment techniques typically depend on policy likelihoods, denoising paths, differentiable reward gradients, or test-time optimization strategies. To address these limitations, we introduce Drifting Preference Optimization (DrPO), an online preference fine-tuning approach tailored for deterministic one-step generators.

In this method, DrPO generates candidate images for each prompt using the current model and ranks them based on a target reward. It then leverages both high- and low-scoring samples to construct an update direction within the feature space. This update comprises a non-parametric dipole preference field and a reference drift derived from the frozen base generator. Optimization is achieved through a detached feature-space regression target. Because the target reward serves exclusively for ranking purposes, DrPO supports training with large-scale, black-box, or non-differentiable rewards, all while maintaining inference efficiency with a single generator call.

We assessed DrPO’s performance on SD-Turbo and SDXL-Turbo models using various benchmarks and target rewards, such as HPSv3 and GenEval. Our results demonstrate that DrPO enhances alignment compared to reward-gradient-free one-step preference baselines. Furthermore, by eliminating the need for reward-model backpropagation, DrPO reduces HPSv3 training computation by a factor of $3.51\times$ under matched effective-batch conditions. Preliminary offline experiments also indicate that sample-based gradient synthesis may be applicable beyond online reward ranking contexts.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...