arXiv

Dual Advantage Fields

June 4, 2026 · Alexey Zemtsov, Maxim Bobrin, Alexander Nikulin, Dmitry V. Dylov, Fakhri Karray, Vladislav Kurenkov, Martin Tak\'a\v{c}, Arip Asadulaev · Original Source

Title: Dual Advantage Fields

Abstract: Offline goal-conditioned reinforcement learning demands both estimates of reachability over long horizons and the ability to compare local actions. While dual goal representations generate value fields capable of capturing global reachability, they fail to explicitly dictate which action is optimal at a specific state. To address this, we introduce Dual Advantage Fields (DAF), a method for extracting policies that converts a bilinear dual value model into a local advantage signal. Within the framework of bilinear dual parameterization, the goal embedding is defined as the gradient of the value field relative to the state representation. DAF utilizes an action-effect model to forecast the discounted feature displacement resulting from an action, subsequently evaluating actions based on how well this displacement aligns with the goal direction. In realizable scenarios, this scoring mechanism corresponds to the goal-conditioned Bellman advantage, ensuring a standard guarantee for local policy improvement. Evaluations across OGBench locomotion, manipulation, and puzzle tasks demonstrate that DAF enhances aggregate RLiable metrics and excels in environments where the locally optimal actions diverge from direct trajectories toward the ultimate goal.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC