BA-T: An Iterative Transformer for Two-View Bundle Adjustment
Title: BA-T: An Iterative Transformer for Two-View Bundle Adjustment
Abstract:
While feed-forward models for 3D reconstruction have leveraged deep cross-view attention to facilitate information exchange between images, they frequently suffer from poor multi-view consistency. This limitation stems from their reliance on extensive decoder stacks and the absence of a structured mechanism for refining geometry. To overcome these challenges, we draw upon the principles of classical bundle adjustment (BA), conceptualizing it as an iterative process of information propagation between local geometry and camera poses.
Building on this insight, we introduce BA-T, an iterative Transformer that embeds BA-style structured updates as a reusable layer within an implicit token space. Rather than employing deep attention networks, BA-T enhances predictions by utilizing a single, lightweight layer to adjust latent residuals. Our experimental results show that BA-T incrementally boosts both pose estimation and reconstruction accuracy with each iteration. Furthermore, it delivers superior cross-view consistency compared to standard decoders and performs on par with or better than significantly larger models, all while utilizing merely 16% of their decoder parameters. BA-T offers a compact, efficient, and structurally grounded alternative to parameter-heavy attention mechanisms, facilitating accurate 3D reconstruction within a streamlined architecture. The source code will be available at https://github.com/zhangganlin/BA-T.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





