arXiv

SEArch: Optimistic Policy Selection Between Scene Noise and Drift for UAV Radar Search

Title: SEArch: Balancing Scene Noise and Drift in Optimistic Policy Selection for UAV Radar Search

Abstract:

Unmanned Aerial Vehicles (UAVs) equipped with radar sensors are increasingly utilized for target search operations across varied terrains. In these missions, targets often display distinct signatures—such as the micro-motion associated with human respiration—that can be detected even through obstacles. However, a significant hurdle lies in the shifting statistics of radar data as the UAV navigates dynamic, potentially non-stationary environments. This variability renders static signal-processing strategies ineffective, necessitating real-time perception and adaptation within the tight resource constraints of an onboard aerial processor.

Given that no single detector excels in all conditions, we propose a multi-policy approach, framing UAV target search as an online policy selection task. This involves choosing from a library of specialized detectors, with performance evaluated by "regret"—the cumulative loss difference compared to the optimal policy for each specific scene. Our model accounts for two distinct challenges: stochastic noise within scenes and shifts between scenes. Unlike previous methods that address only one regime, we employ the Stochastically Extended Adversary (SEA) framework to handle both, without needing prior knowledge of scene dynamics.

To facilitate onboard adaptation, we implement SEA via \textsc{SEArch}, a lightweight, optimistic Follow the Regularized Leader (OFTRL) selector featuring an adaptive learning rate. This approach yields a regret bound of $O(\bar{\sigma}_T \sqrt{T} + \sqrt{J})$, where $\bar{\sigma}_T$ represents radar measurement noise, $J$ denotes the number of scene transitions during the mission horizon $T$, and $T$ is the total duration. To further accelerate adaptation during periods of frequent scene changes, we introduce \textsc{W-SEArch}, a windowed variant that resets every $w$ rounds. This variant achieves a regret of $O(\bar{\sigma}_I \sqrt{w})$ assuming no more than one transition occurs per window. Experimental results demonstrate that our methods reduce regret by up to 30% compared to non-adaptive baselines across various non-stationary scenarios.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...