arXiv

HomeFlow: A Data Flywheel for Smart Home Agent Training with Verifiable Simulation

June 2, 2026 · Yi Gu, Huacan Wang, Shuo Zhang, Yuqing Hou, Lei Xue, Weipeng Ming, Chen Liu, Fangzhou Yu, Kuan Li, Ronghao Chen, Sen Hu, Xiaofeng Mou, Yi Xu · Original Source

Title: HomeFlow: A Data Flywheel for Smart Home Agent Training with Verifiable Simulation

Abstract

Large language model agents are transitioning from text-centric interactions to controlling the physical world, with smart homes serving as a prime example. Effective operation in real domestic settings demands the ability to interpret ambiguous user intentions, adapt to dynamic environments, and execute complex multi-turn reasoning. Yet, current approaches face significant challenges in producing high-quality training data for these agents. To address this gap, we introduce HomeFlow, a verifiable data flywheel tailored for this domain.

Our framework leverages HomeEnv, a unified simulation environment, alongside HomeMaker, which procedurally generates a wide variety of home layouts. The Blueprint component translates open-ended user intents into executable, state-based success criteria. Meanwhile, MCTS-Flow employs environment-guided tree search to synthesize diverse, verifiable multi-turn interaction trajectories. We subsequently refine agent performance through supervised fine-tuning and step-wise RLVE, an approach that drives iterative enhancement via authentic physical feedback.

To assess agent capabilities, we developed SmartHome-Bench, a benchmark designed for evaluating performance across a spectrum of smart home tasks. Our results demonstrate that HomeFlow-RL-4B and HomeFlow-RL-8B achieve task success rates of 84.60% and 87.03%, respectively. Notably, HomeFlow-RL-8B outperforms the leading GPT-5.5 by a margin of 1.23 percentage points.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC