arXiv

Improving Diffusion Planners by Self-Supervised Action Gating with Energies

June 2, 2026 · Yuan Lu, Dongqi Han, Yansen Wang, Dongsheng Li · Original Source

Title: Enhancing Diffusion Planners via Self-Supervised Action Gating with Energies

Abstract: While diffusion planners represent a potent strategy for offline reinforcement learning, they remain susceptible to failure when value-driven selection prioritizes trajectories that achieve high scores but exhibit local inconsistencies with environmental dynamics, leading to fragile execution. To address this, we introduce Self-supervised Action Gating with Energies (SAGE), an inference-time re-ranking technique designed to penalize dynamically inconsistent plans by leveraging a latent consistency signal. The SAGE framework involves training a Joint-Embedding Predictive Architecture (JEPA) encoder on offline state sequences, alongside an action-conditioned latent predictor aimed at modeling short-horizon transitions. During testing, SAGE calculates an energy value for each sampled candidate based on its latent prediction error, integrating this measure of feasibility with value estimates to determine the final action selection. SAGE is compatible with existing diffusion planning architectures capable of trajectory sampling and value-based action selection, requiring neither environment rollouts nor policy re-training. Empirical evaluations across locomotion, navigation, and manipulation benchmarks demonstrate that SAGE enhances both the performance and robustness of diffusion planners.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC