arXiv

Self-Evolving Deep Research via Joint Generation and Evaluation

Title: Self-Evolving Deep Research via Joint Generation and Evaluation

Original: arXiv:2606.04507v1 Announce Type: cross Abstract: Large Language Models (LLMs) have become increasingly adopted in daily applications, with deep research standing out as a particularly important capability. Unlike traditional question-answering (QA) tasks, deep research report generation lacks definitive ground-truth, making reward design inherently unverifiable and limiting effective reinforcement learning. Existing approaches mitigate this challenge with LLM-as-a-judge and query-dependent evaluation rubrics, but they still rely on static evaluators that cannot adapt their standards as the solver improves, leading to insufficient and eventually saturated optimization pressure. We address this limitation with a \textbf{s}elf-evolving \textbf{co}-evolutionary training framework for deep \textbf{re}search evaluation and generation (SCORE), which tightly couples an evaluator and a solver in a shared-parameter learning process. Rather than treating generation and evaluation as isolated modules, we leverage their intrinsic connection to enable joint improvement within a single shared-parameter model. To restrict this process, we introduce a meta-harness, which dynamically controls the evaluation environment based on solver performance, encouraging valid evaluation dimensions and sufficiently deep evaluator search. Extensive experiments on deep research benchmarks demonstrate consistent improvement in report generation quality, showing that co-evolving evaluation and generation is a promising direction for training open-ended research agents.

Rewrite: Title: Self-Evolving Deep Research via Joint Generation and Evaluation

Original: arXiv:2606.04507v1 Announce Type: cross Abstract: The integration of Large Language Models (LLMs) into everyday applications has surged, with deep research emerging as a critical function. In contrast to standard question-answering (QA) scenarios, the creation of deep research reports does not have a clear ground-truth baseline. This absence complicates reward formulation, rendering it unverifiable and hindering the efficacy of reinforcement learning. While current methods attempt to overcome these hurdles using LLM-as-a-judge mechanisms and evaluation criteria tailored to specific queries, they depend on fixed evaluators. These static systems fail to adjust their benchmarks as the solving model advances, resulting in inadequate and ultimately plateauing optimization signals. To overcome this bottleneck, we propose SCORE (\textbf{s}elf-evolving \textbf{co}-evolutionary training framework for deep \textbf{re}search evaluation and generation). This framework integrates an evaluator and a solver through a shared-parameter learning mechanism. Instead of handling generation and assessment as separate entities, our approach exploits their inherent relationship to foster simultaneous enhancement within a unified model. We implement a meta-harness to regulate this dynamic, adjusting the evaluation landscape according to the solver's progress. This mechanism promotes robust evaluation metrics and drives the evaluator to explore deeper search spaces. Our comprehensive tests on deep research benchmarks reveal steady enhancements in the quality of generated reports, indicating that the co-evolution of evaluation and generation offers a viable path for developing open-ended research agents.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.

Cerebras Says It’s Working With All AI Gear Makers Except Nvidia
Bloomberg

Cerebras Says It’s Working With All AI Gear Makers Except Nvidia

Cerebras confirmed partnerships with all major AI hardware vendors except Nvidia. This broad engagement positions Cerebr...

Putin Turns Russia’s AI Future Into a Kremlin Family Business
Bloomberg

Putin Turns Russia’s AI Future Into a Kremlin Family Business

Putin is consolidating Russia’s AI ambitions into a Kremlin family business, effectively turning the sector into a dynas...

Reuters

Meta repeatedly pushes back new AI model release for developers, WSJ says

Meta has repeatedly delayed the release of its new AI model for developers, according to the WSJ. This ongoing postponem...