arXiv

Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation

June 3, 2026 · Roohan Ahmed Khan, Yasheerah Yaqoot, Muhammad Ahsan Mustafa, Dzmitry Tsetserukou · Original Source

Title: Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation

Abstract:

While deep reinforcement learning holds significant promise for equipping autonomous robots with the ability to master complex navigation challenges, its real-world application remains hindered by a heavy reliance on manually crafted reward functions and extensive, time-intensive human tuning. These traditional methods often fail to ensure high success rates in target tasks. To address these limitations, this study introduces AgenticRL, a novel reinforcement learning framework that enhances autonomy in reward formulation, policy optimization, and the deployment of unmanned aerial vehicles (UAVs).

At the core of AgenticRL is a multimodal generative pre-trained transformer (GPT) agent capable of synthesizing visual scene data and natural language task instructions. This agent automates the creation of task-specific reward functions and employs the Proximal Policy Optimization (PPO) algorithm for policy training. Subsequently, it functions as a critic, assessing the trained policy via diagnostic packets to produce constructive feedback. This feedback loop enables the agent to pinpoint failure modes and iteratively refine the reward function, establishing a self-improving cycle. During the inference phase, the framework utilizes real-world imagery and linguistic task descriptions to dynamically detect the current scenario and select the most suitable pre-trained policy for execution.

The efficacy of AgenticRL was assessed across a diverse range of navigational challenges, such as gate traversal, obstacle avoidance, wall barrier crossing with landing, trajectory following, and motion behavior learning. Results indicate that the closed-loop refinement mechanism boosts policy performance by 71% compared to initial reward settings. Furthermore, the study validates the framework’s sim-to-real transfer capability, demonstrating a 94% accuracy rate in simulation-to-reality mapping and achieving a 91% success rate in physical world operations.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC