arXiv

ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

June 2, 2026 · Zifan Xu, Ran Gong, Maria Vittoria Minniti, Kausik Sivakumar, Ahmet Salih Gundogdu, Eric Rosen, Riedana Yan, Tushar Kusnur, Zixing Wang, Di Deng, Peter Stone, Xiaohan Zhang, Karl Schmeckpeper · Original Source

Title: ExpertGen: Enabling Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavioral Priors

Abstract:

Acquiring generalizable and robust behavior cloning policies typically demands extensive datasets of high-quality robotics data. Although human demonstrations, such as those gathered via teleoperation, are the conventional standard for expert behaviors, collecting such data on a large scale in physical environments is often cost-prohibitive. To address this, we present ExpertGen, a novel framework that automates the learning of expert policies within simulation to facilitate scalable sim-to-real transfer.

The ExpertGen methodology begins by establishing a behavior prior through a diffusion policy trained on imperfect demonstrations. These demonstrations may be generated by large language models or supplied by human operators. Subsequently, reinforcement learning is employed to guide this prior toward high task success rates. This is achieved by optimizing the initial noise of the diffusion model while keeping the underlying policy weights frozen. By maintaining a frozen pretrained diffusion policy, ExpertGen constrains exploration to safe, human-like behavior manifolds, thereby regularizing the process and allowing for effective learning even with sparse rewards.

Empirical tests on challenging manipulation benchmarks indicate that ExpertGen consistently generates high-quality expert policies without the need for reward engineering. In industrial assembly scenarios, the framework achieved an overall success rate of 90.5%, while reaching 85% on long-horizon manipulation tasks, surpassing all existing baseline methods. The resulting policies demonstrate dexterous control and maintain robustness across various initial configurations and failure states. To confirm the efficacy of sim-to-real transfer, the learned state-based expert policies were distilled into visuomotor policies using DAgger and successfully deployed on actual robotic hardware.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC