arXiv

Policy Gradient for Continuous-Time Robust Markov Decision Processes

Title: Leveraging Policy Gradient Methods for Continuous-Time Robust Markov Decision Processes

Abstract:

This study explores the application of policy gradient algorithms within the framework of continuous-time Robust Markov Decision Processes (RMDPs). While RMDPs are established for designing reinforcement learning agents that maintain performance guarantees under worst-case transition dynamics, prior work has primarily focused on discrete-time systems and sample-efficient policy gradients in that context. In contrast, this paper extends the analysis to continuous-time dynamics.

We derive both policy gradients and adversarial gradients utilizing pathwise and adjoint-based formulations for both stochastic and ordinary differential equations. Our research introduces two distinct optimization strategies. First, we propose double-loop optimizers that achieve linear convergence in oracle-based scenarios and an $\tilde{\mathcal{O}}(\frac{1}{\epsilon^2})$ sample complexity in sample-based settings. This analysis also contributes novel theoretical tools for undiscounted total cost MDPs. Second, we introduce mean-field optimizers, which function as distributional optimizers. These demonstrate an $\tilde{\mathcal{O}}(\frac{1}{K})$ convergence rate in oracle-based settings and an $\tilde{\mathcal{O}}(\frac{N^2}{\epsilon})$ sample complexity under $N$-particle approximation.

The efficacy of the proposed continuous-time policy gradient algorithms is validated for both optimization approaches on continuous-time RMDPs featuring neural ordinary differential equation dynamics.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Glazer Family Members Said to Study Manchester United Stake Sale
Bloomberg

Glazer Family Members Said to Study Manchester United Stake Sale

Reports indicate the Glazer family is evaluating a potential sale of their Manchester United stake, with family members ...

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines
Bloomberg

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines

Ares’ Blair Jacobson argues that private credit headlines misrepresent reality, highlighting a disconnect between media ...

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion
Bloomberg

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion

Nvidia-backed robotics startup Generalist AI has reached a $2 billion valuation. Founders Pete Florence, Andy Zeng, and ...

TechCrunch

Oura Ring 5 review: Thinner, lighter, better

The Oura Ring 5 is 40% smaller and lighter than its predecessor, offering superior comfort and a discreet, jewelry-like ...

Financial Times

How AI has de-skilled translation

AI fragments specialist translation into routine tasks, effectively de-skilling the profession. This shift reduces compl...

Zurich Insurance Expands Data-Center Offering Beyond the US
Bloomberg

Zurich Insurance Expands Data-Center Offering Beyond the US

Zurich Insurance Group is expanding its data center insurance products internationally, extending coverage beyond the Un...