RoboDream: Compositional World Models for Scalable Robot Data Synthesis
Title: RoboDream: Compositional World Models for Scalable Robot Data Synthesis
Abstract: Expanding the scope of robot learning depends on access to vast and varied demonstration datasets; however, acquiring real-world data through teleoperation is currently too costly and labor-intensive to be practical. Although video diffusion models present a viable path for scaling data, current generative methods are frequently constrained by superficial visual enhancements or prone to "embodiment hallucinations" that produce physically unrealistic movements. To address these challenges, we introduce a generalizable, embodiment-centric world model designed for scalable data generation. This system creates photorealistic demonstrations featuring new objects, in unfamiliar settings, and from previously unseen angles. By grounding generation in rendered robot motions and conditioning on specific scene and object priors, our method effectively separates trajectory execution from environmental synthesis. This framework enables two significant data scaling breakthroughs: first, "retrieval and rebirth," which reuses existing motion trajectories in entirely new contexts without requiring new movement data; and second, "prop-free teleoperation," where human operators manipulate empty space, allowing the model to subsequently generate the target objects and scene, thereby removing the need for physical resets. Our real-world experiments confirm that data generated by this approach consistently enhances downstream policy performance and markedly decreases the volume of real-world data needed for a variety of manipulation tasks.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





