arXiv

See, Infer, Intervene: Proactive World Modeling for Goal-Oriented Social Intelligence

Title: See, Infer, Intervene: Proactive World Modeling for Goal-Oriented Social Intelligence

Abstract:

To move beyond mere observation, multimodal retail agents must anticipate customer needs and determine the appropriate timing and method of assistance prior to any explicit request. This study explores that capability through the See–Infer–Intervene (SII) framework. In this model, a system first observes pre-interaction behaviors, deduces the customer’s hidden intentions, and then decides whether to execute a specific service intervention or remain passive.

We implement the SII framework using the Proactive Intent World Model (PIWM). This model characterizes customer status through AIDA purchasing stages (Attention, Interest, Desire, Action) and BDI psychological dimensions (belief, desire, intention). It forecasts intent shifts conditioned on actions and chooses from five distinct response categories: Greet, Elicit, Inform, Recommend, and Hold. To support this research, we introduce GuidanceSalesBench, a comprehensive smart-retail benchmark featuring state manifests, pre-interaction footage, potential responses, action-conditioned outcomes, and labels for the optimal action.

When PIWM is conditioned on ground-truth customer states to isolate the action-selection process, it achieves a macro F1 score of 0.641 across 30 held-out target videos. This performance surpasses both a zero-shot Qwen2.5-VL-7B baseline and training variants lacking balanced action supervision. However, end-to-end selection based solely on video input results in a lower score of 0.295, falling beneath the 0.414 threshold of a 5-class balanced random baseline. This disparity highlights video-to-state grounding as the primary bottleneck for deployment. Additionally, a preliminary staged pilot in a real store—utilizing paid participants enacting scripted customer behaviors—yielded an action macro F1 of 0.579 on 20 fully annotated videos. We also release 10 additional accessible videos accompanied by index-level labels.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...