arXiv

PerceptTwin: Semantic Scene Reconstruction for Iterative LLM Planning and Verification

June 4, 2026 · Charlie Gauthier, Sacha Morin, Liam Paull · Original Source

Title: PerceptTwin: Semantic Scene Reconstruction for Iterative LLM Planning and Verification

Abstract:

Simulation environments play a critical role in both the learning of robot policies and the verification of planning strategies. Historically, the development of such simulations has been a burdensome task, making it impractical to create bespoke environments tailored to the specific operational contexts of individual robots. To address this challenge, we present PerceptTwin, a fully automated pipeline that generates interactive simulations directly from semantic scene representations derived from a robot’s perception stack. PerceptTwin integrates open-vocabulary object maps with 3D asset generation, affordance prediction, and commonsense condition checking. These resulting simulations enable the validation and refinement of plans prior to their execution on physical robot hardware. Drawing inspiration from AI alignment research, we further introduce an LLM-based judge designed to assess plan correctness and ensure alignment with human preferences. Our experimental results indicate that PerceptTwin’s feedback mechanisms empower LLM planners to refine their strategies, boost safety standards, and withstand harmful black-box prompting attacks. Across our task suite, PerceptTwin increases plan success rates by an average of approximately 39% for planners utilizing GPT5, GPT5Mini, and GPT5Nano. Furthermore, it enhances human plan verification by up to 18% on average, particularly for plans that fail due to unmet skill preconditions. These findings highlight the potential of open-vocabulary scene simulation, sourced from robot perception, as a robust foundation for developing safer and more reliable robot planning systems.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC