Anchor3R: Streaming 3D Reconstruction with Transient Anchors for Long-Horizon Visual Mapping
Title: Anchor3R: Streaming 3D Reconstruction with Transient Anchors for Long-Horizon Visual Mapping
Abstract:
Continuous estimation of camera motion and scene geometry from visual data, while operating under strict limits on memory and computational resources, is a fundamental requirement for long-horizon online visual mapping in robotic perception. While recent feed-forward 3D reconstruction models offer robust geometric priors, their streaming adaptations typically rely on a fixed coordinate system anchored to the initial frame or a persistent scene memory. This static gauge approach results in a discrepancy between training and testing conditions, creates an attention bias favoring early anchors, and causes accumulated drift when processing sequences significantly longer than those encountered during training.
To address these limitations, we introduce \emph{Anchor3R}, a streaming 3D reconstruction framework that redefines feed-forward reconstruction as the prediction of current-centric local measurements rather than persistent global-gauge regression. At every time step, the system predicts local pointmaps and window-relative poses within the coordinate system of the current frame, effectively transforming streaming reconstruction into the generation of relative-pose measurements. These measurements facilitate online pose updates, while techniques such as loop-closure reinsertion and motion averaging synchronize the trajectory and convert local pointmaps into a unified global reconstruction. Evaluations across indoor, outdoor, driving, and RGB-D benchmarks demonstrate that Anchor3R enhances both dense reconstruction quality and long-horizon pose accuracy compared to existing streaming baselines, all while maintaining support for online inference with bounded memory.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




