NewtPhys: Do Foundation Models Understand Newtonian Physics?
Title: NewtPhys: Can Foundation Models Grasp Newtonian Physics?
Abstract: Current evaluations of physics reasoning capabilities in foundation models have primarily relied on synthetic or semi-synthetic environments and visual question-answering formats. Such benchmarks tend to focus on macroscopic events and fail to provide the visual realism necessary to test genuine, low-level Newtonian comprehension. To address this, we present NewtPhys, a four-dimensional dataset annotated with physical properties, derived from multiview imagery of real-world scenarios augmented with physics-based simulations. This resource offers dense, granular annotations across time steps, featuring 3D force vectors and amodal per-pixel data that span physics, tracking, semantics, and geometry. By doing so, it closes the divide between overly simple synthetic constructs and the complexity of realistic visuals. Leveraging NewtPhys, we conduct a comprehensive assessment of 56 Vision-Language Models (VLMs)—comprising 54 open-weight and 2 closed-source frontier models—as well as 10 Vision Foundation Models (VFMs), uncovering significant shortcomings in their ability to reason about low-level physics. Beyond serving as a benchmark, this dataset facilitates future advancements in physics-grounded computer vision and aids in creating more sophisticated, physics-aware evaluation frameworks. The code and data are publicly accessible at https://astra-vision.github.io/NewtPhys.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





