arXiv

Plan, Watch, Recover: A Benchmark and Architectures for Proactive Procedural Assistance

Title: Plan, Watch, Recover: A Benchmark and Architectures for Proactive Procedural Assistance

Abstract

We propose a framework for a proactive, multi-modal assistant capable of providing real-time, step-by-step direction for procedural tasks. This system autonomously determines the optimal timing for interruptions and the specific methods for coaching. However, advancements have been hindered by a lack of large-scale, cross-domain benchmarks that simulate realistic scenarios, especially instances where users diverge from the anticipated sequence of steps. To bridge this gap, we present four key contributions: (1) the release of EgoProactive, a comprehensive wearable-egocentric dataset designed for proactive procedural assistance, featuring explicit annotations for Out-of-Plan (OOP) deviations and corresponding recovery actions; (2) the expansion of five established benchmarks—Ego4D, EPIC-KITCHENS, EgoExo4D, HoloAssist, and HowTo100M—into Pro²Bench, organized under a unified schema for proactive guidance; (3) the development of a decoupled planner–interaction architecture tailored to handle procedural states, visual signals, and the injection of recovery steps; and (4) the introduction of a post-training methodology that facilitates transfer across different model families, a capability validated through cross-backbone replication involving Llama 4 and Qwen-3.6-VL. Our extensive experiments demonstrate that the Llama-4 system significantly enhances the quality of objective interventions compared to both robust proprietary baselines (Claude Opus 4.6, Gemini 3.1 Pro, GPT 5.2) and open-weight models (Qwen3 VL 235B) across all six datasets. Furthermore, oracle-plan experiments reveal that when plan quality is held constant, the trained duplex model delivers high-quality guidance and achieves substantial improvements in Out-of-Plan recovery performance.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

TechCrunch

Meta’s Oversight Board says account bans lack due process, transparency

Meta’s Oversight Board criticized account bans for lacking due process and transparency, citing inconsistent enforcement...

Fed's Daly Says Forward Guidance Could Be Misleading
Bloomberg

Fed's Daly Says Forward Guidance Could Be Misleading

Fed’s Daly warns forward guidance may be misleading or lack clarity.

TechCrunch

Meta rolls out a new AI creator assistant on Facebook

Meta launched an AI creator assistant on Facebook to streamline analytics and content brainstorming. Initially available...

TechCrunch

What to expect from WWDC 2026: Siri’s highly anticipated revamp and Apple Intelligence updates

WWDC 2026 promises a Siri revamp powered by Google’s Gemini and standalone app, plus AI agents in the App Store and Came...

TechCrunch

A burglar used a Waymo to steal yoga clothes in San Francisco — and got away with it

A thief stole yoga clothes using a Waymo, but police failed to catch them because the car’s video data was deleted and b...

Goldman Sachs CEO David Solomon on the Coming Mega IPOs
Bloomberg

Goldman Sachs CEO David Solomon on the Coming Mega IPOs

Goldman Sachs CEO David Solomon anticipates a surge in major IPOs, signaling renewed market confidence and significant o...