Interpretable Policy Distillation for Power Grid Topology Control
Title: Interpretable Policy Distillation for Power Grid Topology Control
Abstract:
While deep reinforcement learning (RL) presents a viable pathway for managing power grid operations in real time, the deployment of large neural policies is hindered by high computational evaluation costs, difficulties in fitting constrained hardware, and a lack of transparency for human operators. This study investigates whether a Proximal Policy Optimization (PPO) agent designed for grid topology control can be compressed into compact, tree-based surrogate models without compromising operational efficacy. We trained a PPO "teacher" model within the standard 14-bus Grid2Op environment, utilizing a reward function focused on stability and employing data collection strategies that emphasize critical, high-loading states. Subsequently, the policy was distilled into both a decision tree and a random forest.
Evaluations on held-out validation episodes demonstrate that both surrogate models outperform the teacher in terms of mean reward and survival duration, all while requiring a fraction of the inference cost. The decision tree, in particular, exhibits high exact-action agreement with the PPO argmax policy and near-complete alignment within its top-ranked actions, yet remains sufficiently small to allow for direct inspection. Furthermore, feature-importance analysis highlights a significant representational shift: whereas the PPO policy depends largely on line-loading signals, the distilled tree is primarily driven by bus-topology variables. These findings indicate that stress-focused distillation can transform a black-box neural controller into a lightweight, auditable, rule-based surrogate ideal for real-time deployment. However, the results also underscore potential risks associated with deterministic actions and the generalization capabilities specific to topology structures.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




