arXiv

Plan, Watch, Recover: A Benchmark and Architectures for Proactive Procedural Assistance

June 4, 2026 · Kaustav Kundu, Ritvik Shrivastava, Maxim Arap, Nanshu Wang, Xianhui Zhu, Quintin Fettes, Gautam Tiwari, Parth Suresh, Th\'eo Moutakanni, Alejandro Castillejo Munoz, Allen Bolourchi, Pascale Fung, Pinar Donmez, Babak Damavandi, Anuj Kumar, Seungwhan Moon · Original Source

Title: Plan, Watch, Recover: A Benchmark and Architectures for Proactive Procedural Assistance

Abstract

We propose a framework for a proactive, multi-modal assistant capable of providing real-time, step-by-step direction for procedural tasks. This system autonomously determines the optimal timing for interruptions and the specific methods for coaching. However, advancements have been hindered by a lack of large-scale, cross-domain benchmarks that simulate realistic scenarios, especially instances where users diverge from the anticipated sequence of steps. To bridge this gap, we present four key contributions: (1) the release of EgoProactive, a comprehensive wearable-egocentric dataset designed for proactive procedural assistance, featuring explicit annotations for Out-of-Plan (OOP) deviations and corresponding recovery actions; (2) the expansion of five established benchmarks—Ego4D, EPIC-KITCHENS, EgoExo4D, HoloAssist, and HowTo100M—into Pro²Bench, organized under a unified schema for proactive guidance; (3) the development of a decoupled planner–interaction architecture tailored to handle procedural states, visual signals, and the injection of recovery steps; and (4) the introduction of a post-training methodology that facilitates transfer across different model families, a capability validated through cross-backbone replication involving Llama 4 and Qwen-3.6-VL. Our extensive experiments demonstrate that the Llama-4 system significantly enhances the quality of objective interventions compared to both robust proprietary baselines (Claude Opus 4.6, Gemini 3.1 Pro, GPT 5.2) and open-weight models (Qwen3 VL 235B) across all six datasets. Furthermore, oracle-plan experiments reveal that when plan quality is held constant, the trained duplex model delivers high-quality guidance and achieves substantial improvements in Out-of-Plan recovery performance.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Top international news

Plan, Watch, Recover: A Benchmark and Architectures for Proactive Procedural Assistance

Related Articles

Meta’s Oversight Board says account bans lack due process, transparency

Fed's Daly Says Forward Guidance Could Be Misleading

Meta rolls out a new AI creator assistant on Facebook

What to expect from WWDC 2026: Siri’s highly anticipated revamp and Apple Intelligence updates

A burglar used a Waymo to steal yoga clothes in San Francisco — and got away with it

Goldman Sachs CEO David Solomon on the Coming Mega IPOs