arXiv

ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

Title: ExpertGen: Enabling Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavioral Priors

Abstract:

Acquiring generalizable and robust behavior cloning policies typically demands extensive datasets of high-quality robotics data. Although human demonstrations, such as those gathered via teleoperation, are the conventional standard for expert behaviors, collecting such data on a large scale in physical environments is often cost-prohibitive. To address this, we present ExpertGen, a novel framework that automates the learning of expert policies within simulation to facilitate scalable sim-to-real transfer.

The ExpertGen methodology begins by establishing a behavior prior through a diffusion policy trained on imperfect demonstrations. These demonstrations may be generated by large language models or supplied by human operators. Subsequently, reinforcement learning is employed to guide this prior toward high task success rates. This is achieved by optimizing the initial noise of the diffusion model while keeping the underlying policy weights frozen. By maintaining a frozen pretrained diffusion policy, ExpertGen constrains exploration to safe, human-like behavior manifolds, thereby regularizing the process and allowing for effective learning even with sparse rewards.

Empirical tests on challenging manipulation benchmarks indicate that ExpertGen consistently generates high-quality expert policies without the need for reward engineering. In industrial assembly scenarios, the framework achieved an overall success rate of 90.5%, while reaching 85% on long-horizon manipulation tasks, surpassing all existing baseline methods. The resulting policies demonstrate dexterous control and maintain robustness across various initial configurations and failure states. To confirm the efficacy of sim-to-real transfer, the learned state-based expert policies were distilled into visuomotor policies using DAgger and successfully deployed on actual robotic hardware.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Advantech's Tsai on Nvidia Collaboration, AI Strategy
Bloomberg

Advantech's Tsai on Nvidia Collaboration, AI Strategy

Advantech's Tsai discusses the Nvidia partnership and AI strategy.

SK Hynix to Double Wafer Capacity to Ease Memory Chip Crunch
Bloomberg

SK Hynix to Double Wafer Capacity to Ease Memory Chip Crunch

SK Hynix plans to double its wafer capacity to alleviate the ongoing global memory chip shortage. This expansion aims to...

AI Productivity Boost Is Overhyped | 3-Minute MLIV
Bloomberg

AI Productivity Boost Is Overhyped | 3-Minute MLIV

The video argues that AI’s productivity boost is overhyped, challenging the assumption that it will significantly enhanc...

Intel's Lip-Bu Tan on Agentic AI & Partner Networks
Bloomberg

Intel's Lip-Bu Tan on Agentic AI & Partner Networks

Intel’s Lip-Bu Tan discusses Agentic AI and the vital role of partner networks in driving innovation.

Haas Says Arm May Hit $15 Billion AI Chip Revenue Goal Early
Bloomberg

Haas Says Arm May Hit $15 Billion AI Chip Revenue Goal Early

Haas suggests Arm may achieve its $15 billion AI chip revenue target sooner than expected. This indicates strong market ...

Arm May Hit $15 Billion AI Chip Revenue Goal Early, CEO Says
Bloomberg

Arm May Hit $15 Billion AI Chip Revenue Goal Early, CEO Says

Arm’s CEO predicts the company could hit its $15 billion AI chip revenue target ahead of schedule. This optimistic outlo...