InPhyRe Discovers: Large Multimodal Models Struggle in Inductive Physical Reasoning
Title: InPhyRe Findings: Large Multimodal Models Falter in Inductive Physical Reasoning
Original: arXiv:2509.12263v3 Announce Type: replace Abstract: Large multimodal models (LMMs) encode physical laws observed during training, such as momentum conservation, as parametric knowledge. It allows LMMs to answer physical reasoning queries, such as the outcome of a potential collision event from visual input. However, since parametric knowledge includes only the physical laws seen during training, it is insufficient for reasoning in inference scenarios that follow physical laws unseen during training. In such novel physical environments, humans could adapt their physical reasoning based on provided demonstrations. This inductive physical reasoning ability is indispensable for LMMs if they are to replace human agents in safety-critical applications. Despite its importance, existing visual benchmarks do not evaluate inductive physical reasoning and only consider the parametric knowledge in LMMs. To this end, we propose InPhyRe, the first visual question answering benchmark to measure inductive physical reasoning in LMMs. InPhyRe evaluates LMMs' ability to predict the outcome of collision events in algorithmically generated synthetic videos. By inspecting over 13 open-source and proprietary LMMs, InPhyRe informs us that (1) LMMs struggle to apply their limited parametric knowledge about universal physical laws to reasoning, (2) inductive physical reasoning in LMMs is weak when the physical laws underlying inference scenarios were unseen during training, and (3) inductive physical reasoning in LMMs suffers from language bias and may ignore the visual inputs, questioning the trustworthiness of LMMs regarding visual inputs.
Rewrite: Large multimodal models (LMMs) store physical principles, such as the conservation of momentum, as parametric knowledge acquired during their training phase. This capability enables them to address physical reasoning questions, including predicting the results of potential collisions based on visual data. Nevertheless, because this parametric knowledge is restricted to the specific physical laws encountered during training, it proves inadequate for reasoning in inference contexts governed by previously unseen physical rules. In these unfamiliar physical settings, humans are capable of adjusting their reasoning processes using provided demonstrations. For LMMs to effectively substitute human agents in safety-critical tasks, this capacity for inductive physical reasoning is essential. Yet, current visual benchmarks fail to assess inductive physical reasoning, focusing solely on the parametric knowledge within LMMs. Addressing this gap, we introduce InPhyRe, the inaugural visual question-answering benchmark designed to gauge inductive physical reasoning in LMMs. InPhyRe tests LMMsâ proficiency in forecasting collision outcomes within algorithmically created synthetic videos. An analysis of more than 13 open-source and proprietary LMMs via InPhyRe reveals three key findings: (1) LMMs find it challenging to leverage their constrained parametric understanding of universal physical laws for reasoning; (2) inductive physical reasoning is notably weak when the underlying physical laws in inference scenarios were not part of the training data; and (3) inductive physical reasoning in LMMs is compromised by language bias, potentially leading them to disregard visual inputs, thereby raising concerns about the reliability of LMMsâ visual processing.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




