arXiv

RDA: Reward Design Agent for Reinforcement Learning

Title: RDA: Reward Design Agent for Reinforcement Learning

Original: arXiv:2606.01672v1 Announce Type: new Abstract: Reinforcement learning has enabled the acquisition of impressive robotic skills, but typically requires hand-crafted reward functions that are slow to design and difficult to align with human intentions. Recent work, such as Eureka, automates reward design by using an LLM to iteratively generate and refine reward code from task descriptions. However, they rely on coarse feedback signals such as success rate, which provide little semantic insight into the learned behavior. As a result, their trained policies achieve the final goal but are frequently poorly aligned with task instructions. We introduce the Reward Design Agent (RDA), a VLM-based agentic framework that injects semantic understanding into reward design. RDA decomposes tasks, visually evaluates trajectories, summarizes failure modes, and iteratively revises reward code to better align with task instructions. Across 12 tabletop manipulation tasks from ManiSkill and 4 whole-body manipulation tasks from HumanoidBench, RDA produces policies substantially more instruction-aligned than those of other baselines, while achieving comparable task success rates. Videos and the generated reward code are available on https://nitinkamra1992.github.io/reward-design-agent.

Rewrite:

While reinforcement learning has facilitated the development of advanced robotic capabilities, it traditionally depends on manually engineered reward functions. These custom rewards are often time-consuming to create and challenging to synchronize with human intent. Although recent approaches like Eureka have automated this process by employing large language models (LLMs) to continuously generate and polish reward code based on task prompts, they face limitations. Specifically, these methods depend on broad metrics, such as success rates, which offer minimal semantic context regarding the agent's actual behavior. Consequently, the resulting policies often reach the target objective but remain misaligned with the specific instructions provided.

To address these issues, we present the Reward Design Agent (RDA), an agentic framework grounded in vision-language models (VLMs) that integrates semantic comprehension into the reward design process. RDA operates by breaking down tasks, visually assessing movement trajectories, identifying patterns of failure, and iteratively updating reward code to ensure stricter adherence to task instructions. Our evaluations across 12 tabletop manipulation scenarios from ManiSkill and 4 whole-body manipulation challenges from HumanoidBench demonstrate that RDA generates policies that are significantly more aligned with instructions than baseline methods, all while maintaining comparable levels of task success. The associated videos and the code for the generated rewards can be accessed at https://nitinkamra1992.github.io/reward-design-agent.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...