arXiv

CIPER: A Unified Framework for Cross-view Image-retrieval and Pose-estimation

Title: CIPER: A Unified Framework for Cross-view Image-retrieval and Pose-estimation

Abstract:

Cross-view geo-localization determines the geographic position of a ground-level photograph by comparing it with images from an aerial database. Current approaches generally address this challenge through either large-scale retrieval or high-precision pose estimation, but rarely both. Retrieval-centric methods allow for searches across wide areas but suffer from lower localization accuracy, whereas pose estimation techniques offer precise results but are restricted to narrow search spaces. Simply chaining these separate pipelines leads to error propagation and misaligned feature representations.

To address these limitations, we define cross-view geo-localization as a single, unified problem that demands simultaneous city-scale retrieval and accurate 3-DoF pose estimation. We introduce CIPER (Cross-view Image-retrieval and Pose-estimation transformER), a novel architecture that executes both tasks jointly via mutually reinforcing feature learning. CIPER employs a shared transformer encoder equipped with task-specific tokens, effectively separating global retrieval features from spatial localization cues.

To overcome the significant domain gap between ground and aerial perspectives, we present a two-way transformer pose decoder. This component leverages ground features as spatial queries to facilitate bidirectional cross-attention. Additionally, a set prediction strategy ensures stable 3-DoF regression within a unified multi-task objective. Evaluations on the VIGOR, KITTI, and Ford Multi-AV datasets show competitive results, particularly in scenarios involving limited fields of view and arbitrary orientations. The code is publicly accessible at https://github.com/yurimjeon1892/CIPER.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

AI Concentration Risk Is the Problem: 3-Minutes MLIV
Bloomberg

AI Concentration Risk Is the Problem: 3-Minutes MLIV

The article argues that AI concentration risk, rather than the technology itself, is the primary concern. It highlights ...

Reuters

Foxconn announces strategic collaboration with Intel on next-gen AI infrastructure

Foxconn and Intel announced a strategic partnership to develop next-generation AI infrastructure. This collaboration aim...

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Reuters

Europe's tech 'liberation day'? Computer says not yet

Europe’s expected tech breakthrough remains unrealized, as current systems indicate that a true "liberation day" has not...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.