arXiv

NoRA: Evaluating Grounded Reasonableness in Visual First-person Normative Action Reasoning

Title: NoRA: Assessing Grounded Reasonableness in Visual First-Person Normative Action Reasoning

Abstract:

As Large Language Models (LLMs) and agentic systems are increasingly integrated into social settings, the ability to demonstrate normative competence has become essential for ensuring safe and appropriate conduct. Yet, current evaluation methods are flawed: they either restrict normative judgment to text-only contexts or simplify the task into selecting from a predetermined list of actions. We contend that these approaches fail to capture real-world complexity. In practical scenarios, agents are not provided with a menu of choices; instead, they must independently identify a reasonable course of action based on visible evidence and provide inspectable justifications.

To address this, we present NoRA, a novel benchmark for visual first-person video reasoning. NoRA challenges models to generate potential next actions and substantiate each choice using an explicit support graph that links facts, reasons, and actions. The dataset consists of 1,420 annotated video clips, divided into a HumanGold-190 split and an LLMSilver-1230 split. Evaluation metrics include action alignment, factual grounding, and support binding, which are synthesized into a comprehensive grounded reasonableness score.

We tested 12 multimodal systems across direct, deliberate, and structured prompting conditions. Our results indicate that while current Vision-Language Models (VLMs) are generally capable of identifying plausible actions and relevant scene details, they consistently fail to construct a complete space of reasonable actions and struggle to correctly link selected actions to their specific local support. NoRA quantifies this deficiency, reframing the core evaluation question from whether a model can merely select an action to whether it can justify an appropriate action based on the correct visible reasons.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs
Bloomberg

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs

China’s robotaxi expansion highlights the policy tension between driving economic growth through AI and protecting emplo...

Exams watchdog warns of rise in high-tech cheating
BBC News

Exams watchdog warns of rise in high-tech cheating

Ofqual warns of rising high-tech cheating, with smart devices involved in 44% of misconduct cases. Invigilators are trai...

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom
Bloomberg

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom

Thailand’s wealthiest individual is investing $4.3 billion in expansion, capitalizing on the booming artificial intellig...

Reuters

Amazon unveils new AI warehouse robot in $12 billion Europe push

Amazon unveiled a new AI warehouse robot, marking a key step in its $12 billion European expansion strategy to enhance l...

US Tech Sector Announces Most Job Cuts in Nearly Two Years
Bloomberg

US Tech Sector Announces Most Job Cuts in Nearly Two Years

The US tech sector recorded its highest wave of layoffs in nearly two years, signaling a significant downturn for the in...

Iran Says No Progress in US Talks | The Opening Trade 6/4/2026
Bloomberg

Iran Says No Progress in US Talks | The Opening Trade 6/4/2026

Iran reports no progress in US talks on June 4, 2026. The Opening Trade highlights the ongoing diplomatic impasse betwee...