Strongly Polynomial Time Complexity of Policy Iteration for $L_\infty$ Robust MDPs
Title: Strongly Polynomial Time Complexity of Policy Iteration for $L_\infty$ Robust MDPs
Abstract:
Markov decision processes (MDPs) serve as a cornerstone for sequential decision-making frameworks. Robust MDPs (RMDPs) expand upon this foundation by incorporating uncertainty into transition probabilities, thereby optimizing outcomes against the most adverse scenario. Specifically, $(s, a)$-rectangular RMDPs utilizing $L_\infty$ uncertainty sets represent a highly expressive and fundamental model; they encompass both classical MDPs and turn-based stochastic games. This study focuses on this model under discounted payoff structures. A critical open question in this domain has been whether polynomial-time or strongly-polynomial-time algorithms exist for these optimization problems. While linear programming provides polynomial-time solutions for standard MDPs regardless of the discount factor, Ye’s pioneering research demonstrated that strongly-polynomial time complexity is achievable for a fixed discount factor. Extending these findings to RMDPs had long remained an unresolved challenge. In this paper, we demonstrate that a robust policy iteration method operates in strongly-polynomial time for $(s, a)$-rectangular $L_\infty$ RMDPs with a constant discount factor, thereby settling a significant algorithmic inquiry.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



