arXiv

Lagrangian Perturbation Diffusion Steering: Latent Reinforcement Learning for Generative Policies

June 2, 2026 · Hikmet Simsir, Ozgur S. Oguz · Original Source

Title: Lagrangian Perturbation Diffusion Steering: Latent Reinforcement Learning for Generative Policies

Abstract:

While behavior cloning utilizing high-capacity generative policies delivers robust imitation capabilities, its effectiveness is frequently hindered by issues related to demonstration coverage and distribution shift. Although direct fine-tuning via reinforcement learning can enhance performance, the process of updating large action decoders is often characterized by instability and poor sample efficiency. To address these challenges, we introduce Lagrangian Perturbation Diffusion Steering (LP-DS), a lightweight adaptation technique that enhances a frozen generative policy by learning a compact perturbation within the noise space prior to decoding. This approach employs a Lagrangian trust-region objective to optimize the perturbation, thereby increasing downstream value while strictly limiting deviation from the latent prior.

In evaluations across RoboMimic manipulation, OpenAI Gym locomotion, and Adroit dexterous manipulation benchmarks, LP-DS demonstrated superior sample efficiency, success rates, and returns compared to existing methods. Notably, it maintained higher action-space entropy than unconstrained noise-space steering and achieved return improvements of up to 25% over prior baselines. Further testing involving flow-matching backbones, a large vision-language-action model, and physical deployments on a Franka robot confirms that the applicability of LP-DS extends beyond simulated benchmarks and compact diffusion policies.

Project page: https://sites.google.com/view/lp-ds/home.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC