arXiv

Scenario Generation for Risk-Aware Reinforcement Learning with Probably Approximately Safe Guarantees

Title: Scenario Generation for Risk-Aware Reinforcement Learning with Probably Approximately Safe Guarantees

Abstract:

Ensuring safety is paramount when deploying reinforcement learning (RL) agents in practical applications. This is particularly crucial because deep RL-derived policies can be vulnerable to transition perturbations, potentially leading to unpredictable or hazardous outcomes. One approach to verifying policy safety involves generating probabilistic barrier-certificates by sampling trajectories against safety constraints, effectively distinguishing between established safe behaviors and those that are unknown. However, deriving precise upper and lower bounds for constraint violation probabilities becomes challenging when policies are sensitive to transition uncertainties that drive the agent into sparsely explored state regions.

To mitigate this issue, we employ a variational autoencoder (VAE) to approximate the distribution of the encountered state-space. By leveraging the latent characteristics of these states, we construct upper and lower-bound barrier-certificates designed to optimize for regions of safe behavior with high confidence. Our work formulates this as a dual optimization problem, wherein the lower-bound barrier-certificate offers a more conservative estimation of the safe region compared to the upper-bound variant. By sampling states from the set difference between these two bounds—representing the non-robust region—during training, we refine these bounds to deliver sharper probabilistic safety guarantees. In this study, we outline the specific guarantees established and validate the tightness of our bounds through experimental analysis.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Microsoft’s AI Chief Says Anthropic Models Are Too Expensive
Bloomberg

Microsoft’s AI Chief Says Anthropic Models Are Too Expensive

Microsoft AI CEO Mustafa Suleyman criticized Anthropic’s models as too expensive. Meanwhile, Microsoft plans to allow us...

Ramp Notches $44 Billion Valuation in New Funding Round
Bloomberg

Ramp Notches $44 Billion Valuation in New Funding Round

RAMP secured a $44 billion valuation in its latest funding round. CEO Eric Glyman attended the 2026 Reagan National Econ...

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs
Bloomberg

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs

China’s robotaxi expansion highlights the policy tension between driving economic growth through AI and protecting emplo...

Exams watchdog warns of rise in high-tech cheating
BBC News

Exams watchdog warns of rise in high-tech cheating

Ofqual warns of rising high-tech cheating, with smart devices involved in 44% of misconduct cases. Invigilators are trai...

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom
Bloomberg

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom

Thailand’s wealthiest individual is investing $4.3 billion in expansion, capitalizing on the booming artificial intellig...

Reuters

Amazon unveils new AI warehouse robot in $12 billion Europe push

Amazon unveiled a new AI warehouse robot, marking a key step in its $12 billion European expansion strategy to enhance l...