arXiv

RUBAS: Rubric-Based Reinforcement Learning for Agent Safety

Title: RUBAS: Applying Rubric-Based Reinforcement Learning to Enhance Agent Safety

Abstract:

As Large Language Models (LLMs) evolve into agents equipped with external tools, they introduce a novel category of safety challenges rooted in real-world execution, distinct from the risks associated with simple text generation. Current alignment techniques frequently depend on broad refusal mechanisms or static oversight, which complicates the effort to balance safety with effective tool usage across a wide spectrum of agentic hazards. To address this, we present RUBAS, a reinforcement learning framework for agent safety grounded in specific rubrics. RUBAS breaks down agent conduct into four distinct categories: argument safety, helpfulness, tool-use safety, and response safety. By offering fine-grained and interpretable rewards throughout entire agent trajectories, these structured rubrics allow reinforcement learning to optimize for safe tool interaction without compromising task success. Comprehensive evaluations across various models and agent safety benchmarks demonstrate that RUBAS surpasses standard alignment baselines in safety performance, lowers the incidence of tool-grounded hallucinations, and retains competitive utility. These findings indicate that employing multi-dimensional rubric rewards serves as a potent training signal for aligning LLM agents in safety-critical environments involving tool use.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

AI Concentration Risk Is the Problem: 3-Minutes MLIV
Bloomberg

AI Concentration Risk Is the Problem: 3-Minutes MLIV

The article argues that AI concentration risk, rather than the technology itself, is the primary concern. It highlights ...

Reuters

Foxconn announces strategic collaboration with Intel on next-gen AI infrastructure

Foxconn and Intel announced a strategic partnership to develop next-generation AI infrastructure. This collaboration aim...

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Reuters

Europe's tech 'liberation day'? Computer says not yet

Europe’s expected tech breakthrough remains unrealized, as current systems indicate that a true "liberation day" has not...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.