arXiv

NoRA: Evaluating Grounded Reasonableness in Visual First-person Normative Action Reasoning

June 4, 2026 · Sichao Li, Sai Ma, Daniel Kilov, Secil Yanik Guyot, Zhuang Li, Seth Lazar · Original Source

Title: NoRA: Assessing Grounded Reasonableness in Visual First-Person Normative Action Reasoning

Abstract:

As Large Language Models (LLMs) and agentic systems are increasingly integrated into social settings, the ability to demonstrate normative competence has become essential for ensuring safe and appropriate conduct. Yet, current evaluation methods are flawed: they either restrict normative judgment to text-only contexts or simplify the task into selecting from a predetermined list of actions. We contend that these approaches fail to capture real-world complexity. In practical scenarios, agents are not provided with a menu of choices; instead, they must independently identify a reasonable course of action based on visible evidence and provide inspectable justifications.

To address this, we present NoRA, a novel benchmark for visual first-person video reasoning. NoRA challenges models to generate potential next actions and substantiate each choice using an explicit support graph that links facts, reasons, and actions. The dataset consists of 1,420 annotated video clips, divided into a HumanGold-190 split and an LLMSilver-1230 split. Evaluation metrics include action alignment, factual grounding, and support binding, which are synthesized into a comprehensive grounded reasonableness score.

We tested 12 multimodal systems across direct, deliberate, and structured prompting conditions. Our results indicate that while current Vision-Language Models (VLMs) are generally capable of identifying plausible actions and relevant scene details, they consistently fail to construct a complete space of reasonable actions and struggle to correctly link selected actions to their specific local support. NoRA quantifies this deficiency, reframing the core evaluation question from whether a model can merely select an action to whether it can justify an appropriate action based on the correct visible reasons.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC