DINO-GFSA: Geo-Localization via Semantic Gated Fusion and Mamba-based Sequential Aggregation
**Title: DINO-GFSA: Enhancing Geo-Localization through Semantic Gated Fusion and Mamba-Driven Sequential Aggregation
Abstract: In environments where Global Navigation Satellite System (GNSS) signals are unavailable, Cross-View Geo-Localization (CVGL) serves as a vital mechanism for enabling Unmanned Aerial Vehicles (UAVs) to determine their own position and locate targets. Despite its importance, maintaining robust semantic understanding while retaining detailed spatial information presents a significant hurdle. To overcome this, we present DINO-GFSA, a novel framework that employs a DINOv3 (ViTL) backbone adapted via LoRA (Low-Rank Adaptation). This approach ensures parameter efficiency while delivering high-capacity feature representations. Central to our method is the introduction of a Semantic Gated Residual Fusion module; this component leverages high-level semantic data to selectively calibrate and merge low-level spatial features, thereby effectively narrowing the semantic gap. Additionally, we have developed a Mamba-based Sequential Aggregation Head capable of modeling long-range spatial dependencies with linear computational complexity. Our experimental results indicate state-of-the-art performance on both the University-1652 and DenseUAV benchmarks. Notably, the model exceeds the previous top performance on the DenseUAV dataset by 3.48% in Recall@1. These findings confirm that DINO-GFSA offers a generalized and resilient solution for UAV-based CVGL tasks.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





