arXiv

COP-Q: Safety-First Reinforcement Learning for Robot Control via Cholesky-Ordered Projection

Title: COP-Q: Prioritizing Safety in Reinforcement Learning for Robot Control through Cholesky-Ordered Projection

Abstract: In the realm of safe robot control, the primary challenge lies in maximizing cumulative returns while strictly adhering to safety constraints. Traditional off-policy safe reinforcement learning approaches typically employ separate critic ensembles to learn reward and safety Q-values independently, managing uncertainty for each objective in isolation. This fragmented, objective-wise methodology ignores the correlation between objectives, often resulting in excessively conservative value estimates that compromise sample efficiency. To overcome these limitations, we introduce Cholesky-Ordered Projection Q-learning (COP-Q), a safety-centric framework that integrates inter-objective covariance into vector-valued Q-value estimation. By constructing a generalized confidence bound within the joint Q-value space, COP-Q utilizes Cholesky factorization to establish a sequential encoding of objective priorities. This mechanism maintains necessary conservatism for safety while dynamically mitigating unnecessary caution in the reward objective. The refined estimates are subsequently applied to both temporal-difference target calculations and actor optimization. COP-Q introduces negligible computational costs and is seamlessly integrable with most contemporary deep Q-learning architectures. Empirical evaluations across robot locomotion tasks in Brax and safe navigation scenarios in Safety-Gymnasium—encompassing both hard and soft safety regimes—reveal that COP-Q delivers robust safety outcomes alongside sample efficiency that is either competitive with or superior to established baseline methods.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion
Bloomberg

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion

Nvidia-backed robotics startup Generalist AI has reached a $2 billion valuation. Founders Pete Florence, Andy Zeng, and ...

TechCrunch

Oura Ring 5 review: Thinner, lighter, better

The Oura Ring 5 is 40% smaller and lighter than its predecessor, offering superior comfort and a discreet, jewelry-like ...

Financial Times

How AI has de-skilled translation

AI fragments specialist translation into routine tasks, effectively de-skilling the profession. This shift reduces compl...

Zurich Insurance Expands Data-Center Offering Beyond the US
Bloomberg

Zurich Insurance Expands Data-Center Offering Beyond the US

Zurich Insurance Group is expanding its data center insurance products internationally, extending coverage beyond the Un...

Emerging-Market Stocks Fall as Broadcom Miss Disrupts AI Trade
Bloomberg

Emerging-Market Stocks Fall as Broadcom Miss Disrupts AI Trade

Broadcom’s earnings miss triggered a sell-off in AI stocks, dragging down emerging-market equities. This disruption high...

Revolut Co-Founder, CTO Vlad Yatsenko to Step Down From Role
Bloomberg

Revolut Co-Founder, CTO Vlad Yatsenko to Step Down From Role

Revolut co-founder and CTO Vlad Yatsenko is stepping down from his executive role. The resignation marks a significant l...