Interaction-Limited Safe Continuous-Time RL for Dynamical Medical Treatment
Title: Interaction-Limited Safe Continuous-Time RL for Dynamical Medical Treatment
Original: arXiv:2606.01051v1 Announce Type: new Abstract: Dynamic medical treatment requires deciding treatment intensity and intervention timing, while patient states evolve continuously and adverse events may occur between clinical interactions. Most existing treatment learning methods assume fixed schedules or enforce safety only at discrete decision points. We propose Interaction-Limited Safe Continuous-Time Reinforcement Learning, a framework that jointly optimizes treatment administration and clinical interaction timing under trajectory-level safety constraints. Our key idea is to reformulate the continuous time treatment problem as an option-based semi-Markov decision process, where each option specifies a continuous-time treatment policy and its duration. We develop a safety-tightening mechanism showing that suitably constructed constraints at interaction times guarantee safety over the full continuous-time trajectory with high probability. We further establish finite-sample guarantees for policy learning from logged treatment trajectories and introduce a practical data-driven conservative surrogate. Experiments show that the proposed adaptive interaction-timing mechanism improves both safety and treatment effectiveness over equidistant interaction schemes across different safe policy optimization methods.
Rewrite:
Dynamic medical care necessitates determining both the dosage of interventions and the precise moments they are administered, all while patients’ conditions change continuously and potential complications can arise outside of scheduled clinical visits. Conventional approaches to learning treatment strategies typically rely on rigid schedules or restrict safety enforcement to specific, discrete decision moments. To address this, we introduce Interaction-Limited Safe Continuous-Time Reinforcement Learning, a novel framework designed to simultaneously optimize the delivery of treatments and the timing of clinical check-ins, ensuring safety is maintained throughout the entire trajectory. The core innovation involves transforming the continuous-time treatment challenge into an option-based semi-Markov decision process. In this model, each "option" defines both a continuous-time policy and its specific duration. We present a safety-tightening approach that demonstrates how carefully designed constraints applied at interaction points can ensure high-probability safety across the complete continuous-time path. Additionally, we provide finite-sample guarantees for learning policies from recorded treatment histories and propose a practical, data-driven conservative surrogate. Our experimental results indicate that this adaptive mechanism for timing interactions enhances both patient safety and treatment efficacy compared to fixed-interval interaction schemes, consistently outperforming various safe policy optimization techniques.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





