arXiv

Unifying Model-Free Efficiency and Model-Based Representations via Latent Dynamics

June 4, 2026 · Jashaswimalya Acharjee, Balaraman Ravindran · Original Source

Title: Bridging Model-Free Efficiency and Model-Based Representations Through Latent Dynamics

Abstract: This paper introduces Unified Latent Dynamics (ULD), a new reinforcement learning framework designed to merge the computational efficiency of model-free techniques with the robust representational capabilities of model-based methods, all while avoiding the costs associated with explicit planning. By mapping state-action pairs into a latent space where the true value function can be approximated as linear, ULD operates effectively with a unified set of hyperparameters across a wide array of domains. This versatility extends from continuous control tasks involving low-dimensional and pixel-based inputs to complex, high-dimensional Atari environments. Theoretically, we demonstrate that, provided certain mild conditions are met, the fixed point achieved by our embedding-based temporal-difference updates aligns with that of a corresponding linear model-based value expansion. Furthermore, we establish explicit error bounds that link the fidelity of the embedding to the quality of value approximation. In implementation, ULD utilizes synchronized updates for its encoder, value, and policy networks, incorporates auxiliary losses to refine short-horizon predictive dynamics, and applies reward-scale normalization to maintain learning stability, particularly in scenarios with sparse rewards. Our evaluation across 80 environments—including Gym locomotion, DeepMind Control (covering both proprioceptive and visual modalities), and Atari—shows that ULD either matches or outperforms specialized model-free and general model-based baselines. The approach achieves cross-domain proficiency with minimal hyperparameter tuning and a significantly smaller parameter footprint. These findings suggest that value-aligned latent representations, on their own, are sufficient to provide the adaptability and sample efficiency typically reserved for full model-based planning systems.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC