Post-Hoc Robustness for Model-Based Reinforcement Learning
Title: Post-Hoc Robustness for Model-Based Reinforcement Learning
Abstract
To enhance the practical utility of reinforcement learning (RL) in real-world scenarios, the domain of adversarially robust RL focuses on training agents capable of withstanding adversarial environmental perturbations. In this framework, an agent optimizes its policy against perturbations introduced by an adversary, creating a zero-sum Markov game. When adversarial robustness is integrated with model-based RL, the adversary may shift its focus from the training environment itself to the learned transition model. Building on this concept, the present study proposes a method for post-hoc robustification of deep RL agents during inference. By leveraging the learned model alongside a pre-trained nominal policy, our technique executes a robust policy improvement step. This approach aims to bolster robustness without necessitating further neural network training. Specifically, we employ model-predictive control via adversarial rollouts, which are approximated using projected gradient descent within a defined bounded uncertainty set. Additionally, these offline rollouts are conducted with attention to mitigating out-of-distribution challenges. The effectiveness of this methodology is confirmed through significant gains in robustness, as demonstrated by evaluations in perturbed Gymnasium MuJoCo environments, all while respecting the computational constraints inherent to post-hoc inference.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



