SEArch: Optimistic Policy Selection Between Scene Noise and Drift for UAV Radar Search
Title: SEArch: Balancing Scene Noise and Drift in Optimistic Policy Selection for UAV Radar Search
Abstract:
Unmanned Aerial Vehicles (UAVs) equipped with radar sensors are increasingly utilized for target search operations across varied terrains. In these missions, targets often display distinct signatures—such as the micro-motion associated with human respiration—that can be detected even through obstacles. However, a significant hurdle lies in the shifting statistics of radar data as the UAV navigates dynamic, potentially non-stationary environments. This variability renders static signal-processing strategies ineffective, necessitating real-time perception and adaptation within the tight resource constraints of an onboard aerial processor.
Given that no single detector excels in all conditions, we propose a multi-policy approach, framing UAV target search as an online policy selection task. This involves choosing from a library of specialized detectors, with performance evaluated by "regret"—the cumulative loss difference compared to the optimal policy for each specific scene. Our model accounts for two distinct challenges: stochastic noise within scenes and shifts between scenes. Unlike previous methods that address only one regime, we employ the Stochastically Extended Adversary (SEA) framework to handle both, without needing prior knowledge of scene dynamics.
To facilitate onboard adaptation, we implement SEA via \textsc{SEArch}, a lightweight, optimistic Follow the Regularized Leader (OFTRL) selector featuring an adaptive learning rate. This approach yields a regret bound of $O(\bar{\sigma}_T \sqrt{T} + \sqrt{J})$, where $\bar{\sigma}_T$ represents radar measurement noise, $J$ denotes the number of scene transitions during the mission horizon $T$, and $T$ is the total duration. To further accelerate adaptation during periods of frequent scene changes, we introduce \textsc{W-SEArch}, a windowed variant that resets every $w$ rounds. This variant achieves a regret of $O(\bar{\sigma}_I \sqrt{w})$ assuming no more than one transition occurs per window. Experimental results demonstrate that our methods reduce regret by up to 30% compared to non-adaptive baselines across various non-stationary scenarios.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





