arXiv

Honey, I Shrunk the Arc de Triomphe!

Title: Honey, I Shrunk the Arc de Triomphe!

Original: arXiv:2606.02379v1 Announce Type: new Abstract: Metric scale monocular geometry estimation has seen significant progress through large-scale data aggregation, yet current foundation models suffer from a persistent ''scale-collapse'' phenomenon: distant landmarks and vast landscapes are metrically underestimated. We hypothesize that this performance gap stems from a training data bottleneck, where existing metric-scale datasets are hardware-constrained to homogenous vehicle-captured LiDAR or short-range indoor scans, or consist of synthetic data that lacks the semantic complexity of the physical world. To bridge this gap, we curate a new metrically-grounded, in-the-wild dataset that we call MetricScenes, gathered from a variety of sources including Internet photo collections and stereo imagery. We estimate camera poses and initial depth maps for each scene using off-the-shelf methods, and recover absolute scale from geo-tagged metadata as well as known stereo camera baselines. We also improve the quality of depth maps derived from MetricScenes via a new two-stage Poisson completion method. Fine-tuning MoGe-2 on our dataset significantly mitigates scale-collapse and achieves superior metric accuracy in unconstrained, open-domain scenes while maintaining state-of-the-art performance on standard benchmarks.

Rewrite:

arXiv:2606.02379v1 Announcement Type: New

Abstract:

While the aggregation of extensive datasets has driven substantial advancements in metric-scale monocular geometry estimation, contemporary foundation models continue to grapple with a recurring issue known as "scale-collapse." This defect results in the systematic underestimation of the true dimensions of distant landmarks and expansive landscapes. We propose that this limitation arises from a critical bottleneck in training data: current metric-scale collections are often restricted by hardware limitations to uniform LiDAR scans from vehicles or limited indoor environments, or they rely on synthetic data that fails to capture the semantic intricacy of real-world scenarios.

To address this deficiency, we have developed MetricScenes, a novel dataset grounded in real-world metrics and captured in uncontrolled environments. This collection is compiled from diverse origins, such as online photo archives and stereo imaging sources. For every scene within MetricScenes, we utilize standard tools to determine camera positions and generate preliminary depth maps. Absolute scaling is then restored by leveraging geo-tagged metadata and established stereo camera baseline measurements. Furthermore, we enhance the fidelity of these depth maps through an innovative two-stage Poisson completion technique. When MoGe-2 is fine-tuned using our dataset, the model effectively reduces scale-collapse, delivering enhanced metric precision in open-domain, unconstrained settings without compromising its leading-edge performance on established benchmarks.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...