UR$^2$: Unify RAG and Reasoning through Reinforcement Learning
Title: UR$^2$: Integrating RAG and Reasoning via Reinforcement Learning
Abstract
Large Language Models (LLMs) have demonstrated significant proficiency by leveraging two distinct yet complementary approaches: Retrieval-Augmented Generation (RAG) to ground knowledge, and Reinforcement Learning from Verifiable Rewards (RLVR) to facilitate complex reasoning. Despite this, current efforts to merge these paradigms are often restricted in scope, typically focusing on open-domain question answering with static retrieval configurations. This limitation hinders their ability to generalize across a wider array of domains.
To overcome these challenges, we introduce UR$^2$ (Unified RAG and Reasoning), a versatile reinforcement learning framework designed to dynamically orchestrate the interplay between retrieval and reasoning. UR$^2$ features two innovative components: a difficulty-aware curriculum that triggers retrieval exclusively for difficult cases, and a hybrid knowledge access method that merges domain-specific offline databases with real-time summaries generated by the LLM. These mechanisms work together to balance retrieval and reasoning tasks while enhancing resilience against noisy data.
Empirical evaluations across open-domain QA, MMLU-Pro, medical, and mathematical reasoning benchmarks reveal that UR$^2$, implemented on Qwen-2.5-3/7B and LLaMA-3.1-8B architectures, consistently surpasses current RAG and RL baselines. Furthermore, it delivers performance levels on par with GPT-4o-mini and GPT-4.1-mini on various benchmarks. The source code for this project is publicly accessible at https://github.com/Tsinghua-dhy/UR2.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



