arXiv

On the Generalization Gap in Self-Evolving Language Model Reasoning

Title: Investigating the Generalization Gap in Self-Evolving Language Model Reasoning

Abstract: Emerging research indicates that large language models (LLMs) are capable of self-improvement via self-evolution (SE), a process driven by supervision signals produced by the models themselves. This study investigates the efficacy of such systems within a rigorous closed-loop environment, wherein the self-evolution algorithm is restricted to an unlabeled prompt dataset and a foundational model. The central question is to what extent internally generated supervision can approximate the performance of oracle-supervised training. We examine four distinct methodologies within a cohesive offline self-evolution framework: single-round verification, iterative training, curriculum learning, and multi-turn revision incorporating feedback. Our experimental analysis primarily utilizes Knights and Knaves (KK) logical reasoning tasks, selected for their deterministic answers, adjustable difficulty, and suitability as a testbed for evaluating generalization from easy to hard problems. Our findings demonstrate that while self-evolution reliably enhances performance over the baseline, gains diminish with excessive computational investment, ultimately failing to bridge the significant performance gap to oracle supervision. Notably, multi-turn critic-revision processes employing larger models yield superior results; for instance, Gemma 12B approaches the efficacy of oracle-supervised training. Furthermore, assessments on real-world reasoning benchmarks reveal that performance improvements remain limited. Collectively, these results delineate the boundaries of closed-loop self-evolution, highlighting that internally derived supervision proves inadequate under this minimalistic configuration.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...