arXiv

Coupled Local and Global World Models for Efficient First Order RL

June 3, 2026 · Joseph Amigo, Rooholla Khorrambakht, Nicolas Mansard, Ludovic Righetti · Original Source

Title: Efficient First-Order Reinforcement Learning via Coupled Local and Global World Models

Abstract:

Standard simulators often falter when dealing with intricate sensory data, such as visual perception, and complex physical phenomena like non-rigidity and contact interactions. World models present a compelling solution by more accurately capturing these dynamics. However, their high computational cost during evaluation has hindered their adoption in popular reinforcement learning (RL) frameworks. While these frameworks have successfully applied simulators to solve complex locomotion challenges, they remain ineffective for manipulation tasks.

To address this, we propose a methodology that eliminates the need for simulators altogether, training RL policies directly within world models derived from real-world robotic interactions. The foundation of our approach is a novel decoupled first-order gradient (FoG) technique, which facilitates policy training using large-scale diffusion models. This method leverages a comprehensive world model to produce precise forward trajectories, while a streamlined latent-space surrogate approximates local dynamics to enable efficient gradient calculation. By integrating these local and global world models, we achieve high-fidelity trajectory unrolling that remains computationally feasible for differentiation.

We validated our method on the Push-T manipulation task, where it demonstrated superior sample efficiency compared to Proximal Policy Optimization (PPO). Additionally, we assessed the approach in an ego-centric object manipulation scenario involving a quadruped robot. These findings underscore the potential of learning within data-driven world models as a viable strategy for tackling difficult, image-based RL tasks that are challenging to model using traditional, hand-crafted physics simulators.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC