arXiv

GeoMin: Data-Efficient Semi-Supervised RLVR via Geometric Distribution Modeling

Title: GeoMin: Geometric Distribution Modeling for Data-Efficient Semi-Supervised RLVR

Abstract: While Reinforcement Learning with Verifiable Rewards (RLVR) has markedly improved the reasoning capabilities of Large Language Models (LLMs), it encounters a critical trade-off. Standard supervised approaches are hindered by prohibitive annotation expenses, whereas unsupervised methods are prone to significant model collapse. Recent semi-supervised RLVR strategies attempt to resolve this by leveraging a limited labeled dataset to steer the learning process on unlabeled data, striking a favorable balance between training performance and annotation costs. Nevertheless, these existing methods remain constrained by a substantial data-efficiency bottleneck, primarily because they depend on coarse performance heuristics. This reliance results in the underutilization of the majority of valuable data instances.

To overcome this limitation, we introduce GeoMin, a novel approach that models global feature distributions derived from labeled data. By decoding the structural differences between correct and incorrect rollouts, GeoMin establishes a robust prior for evaluating the reliability of self-reward signals, thereby maximizing the utility of unlabeled data. Our empirical results demonstrate that GeoMin exceeds the performance of the strongest baselines by 4.1%. Notably, it achieves superior results to fully supervised models while utilizing only 10% of the required annotations, highlighting its exceptional data efficiency.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.

Cerebras Says It’s Working With All AI Gear Makers Except Nvidia
Bloomberg

Cerebras Says It’s Working With All AI Gear Makers Except Nvidia

Cerebras confirmed partnerships with all major AI hardware vendors except Nvidia. This broad engagement positions Cerebr...

Putin Turns Russia’s AI Future Into a Kremlin Family Business
Bloomberg

Putin Turns Russia’s AI Future Into a Kremlin Family Business

Putin is consolidating Russia’s AI ambitions into a Kremlin family business, effectively turning the sector into a dynas...

Reuters

Meta repeatedly pushes back new AI model release for developers, WSJ says

Meta has repeatedly delayed the release of its new AI model for developers, according to the WSJ. This ongoing postponem...