Local Guidance, Global Impact: Gaussian-Reshaped Trust Region Unlocks Behavior Transitions
Title: Local Guidance, Global Impact: Gaussian-Reshaped Trust Region Unlocks Behavior Transitions
Abstract: Although Proximal Policy Optimization (PPO) excels in stationary contexts, our findings indicate that its conventional optimization framework falters in continual and non-stationary settings. This performance gap is not caused by limited model capacity or excessively tight clipping mechanisms. Rather, PPO relies on persistent, directionally inefficient local updates, revealing an absence of geometry-aware guidance needed to accumulate substantial behavioral shifts and facilitate transitions to new patterns. While regularization techniques based on divergence offer some geometric insight, their monotonically rising penalties inadvertently suppress significant policy deviations, even when such changes are essential for effective adaptation. To overcome this hurdle, we introduce Gaussian Trust Region Policy Optimization (GTR), a method that reformulates the trust region via a Gaussian kernel. This approach yields a constraint that is both bounded and non-monotonic, ensuring robust local stability while gradually loosening restrictions during sustained high-advantage updates. Additionally, we present a Mixture Gaussian Anchor that adjusts to recent policy trajectories, thereby mitigating variance caused by outdated references. GTR is independent of specific architectures and delivers strong results across various domains, including video games, simulated robotic control, open-world exploration, and post-training of language models. These outcomes suggest that designing trust regions with geometric awareness offers a viable path toward robust reinforcement learning in complex, non-stationary environments. Our code is accessible at https://anonymous.4open.science/r/GTR_demo/README.md.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



