arXiv

Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems

June 2, 2026 · Barak Or · Original Source

Title: Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems

Original: arXiv:2606.00090v1 Announce Type: cross Abstract: Physical AI systems increasingly map multimodal observations, language instructions, and learned world representations into physically consequential actions. Robotics foundation models, vision-language-action models, and world-model-based autonomous systems can condition decisions that move vehicles, robots, drones, and industrial machines. This transition exposes a safety problem that is not fully captured by conventional AI content moderation or by classical robot safety alone: a black-box model may issue a physically consequential action while appearing confident, plausible, and semantically aligned. The resulting failure can be silent, arising from sensor drift, occlusion, state-estimation error, distribution shift, hallucinated affordances, or invalid physical assumptions before downstream hardware controllers detect a violation. Across embodied foundation models, world models, robotics simulation, embodied safety benchmarks, safe control, runtime assurance, uncertainty estimation, verification, and guardrail evaluation, model capability and safety mechanisms have advanced along largely separate technical tracks. A recurring gap synthesized here is that no single stream surveyed in this review supplies a complete runtime authorization boundary between black-box Physical AI models and physical execution. The resulting analysis develops a bounded problem formulation, a definition of silent physical-action failure, a taxonomy of runtime guardrail functions, and evaluation requirements for comparing guardrails as Physical AI assurance mechanisms.

Rewrite: Title: Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems

Abstract: Physical AI systems are increasingly translating multimodal observations, linguistic instructions, and learned representations of the world into actions with tangible physical consequences. Robotics foundation models, vision-language-action architectures, and autonomous systems grounded in world models now enable decision-making processes that control vehicles, robots, drones, and industrial machinery. This evolution highlights a safety challenge that existing frameworks—whether traditional AI content moderation or classical robot safety protocols—fail to fully address. Specifically, a black-box model might generate a physically significant action that appears confident, plausible, and semantically appropriate, yet still pose a risk. Such failures can remain undetected ("silent") due to factors like sensor drift, occlusion, errors in state estimation, distribution shifts, hallucinated affordances, or flawed physical assumptions, persisting until downstream hardware controllers identify a violation.

While advancements have been made across various domains—including embodied foundation models, world models, robotics simulation, embodied safety benchmarks, safe control, runtime assurance, uncertainty estimation, verification, and guardrail evaluation—these improvements in model capability and safety mechanisms have largely progressed along independent technical paths. This review synthesizes a critical gap: none of the surveyed technical streams currently provide a comprehensive runtime authorization boundary between black-box Physical AI models and their physical execution. To address this, the analysis establishes a bounded problem formulation, defines the concept of silent physical-action failure, categorizes runtime guardrail functions, and outlines evaluation criteria for assessing guardrails as assurance mechanisms for Physical AI.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC