arXiv

The Cartesian Shortcut: Re-evaluate Vision Reasoning in Polar Coordinate Space

Title: The Cartesian Shortcut: Re-evaluating Visual Reasoning in Polar Coordinate Space

Abstract:

As Multimodal Large Language Models (MLLMs) rapidly saturate standard visual reasoning benchmarks, a critical question arises: do these impressive scores truly indicate robust visual comprehension? We identify a pervasive vulnerability known as the "Cartesian Shortcut." Current visual reasoning benchmarks predominantly rely on orthogonal, grid-based layouts that can be easily discretized into explicit textual coordinates. Models systematically exploit this characteristic, heavily depending on text-based deductive reasoning to aid in solving visual problems.

To dismantle this shortcut, we introduce Polaris-Bench. This new benchmark re-formulates 53 visual reasoning tasks into Polar coordinate space, providing paired Cartesian counterparts as references. Crucially, this approach preserves consistent logical constraints and task semantics, thereby fundamentally breaking the orthogonal prior that models typically exploit.

Comprehensive evaluations across 14 state-of-the-art MLLMs reveal a stark contrast in performance: frontier models that achieve scores between 70% and 83% on Cartesian layouts collapse to a range of 31%–39% on their Polar equivalents. This degradation persists even when the tasks are logically equivalent. Furthermore, the reasoning improvements observed on Cartesian layouts are significantly diminished when applied to Polar equivalents. These findings expose a critical deficiency in current MLLMs: a lack of topology-invariant visual reasoning.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...