arXiv

GeoMin: Data-Efficient Semi-Supervised RLVR via Geometric Distribution Modeling

June 4, 2026 · Guangcheng Zhu, Shenzhi Yang, Haobo Wang, Xing Zheng, Yingfan MA, Xuening Feng, Zhongqi Chen, Kai Tang, Zhengqing Zang, Bowen Song, Weiqiang Wang, Gang Chen · Original Source

Title: GeoMin: Geometric Distribution Modeling for Data-Efficient Semi-Supervised RLVR

Abstract: While Reinforcement Learning with Verifiable Rewards (RLVR) has markedly improved the reasoning capabilities of Large Language Models (LLMs), it encounters a critical trade-off. Standard supervised approaches are hindered by prohibitive annotation expenses, whereas unsupervised methods are prone to significant model collapse. Recent semi-supervised RLVR strategies attempt to resolve this by leveraging a limited labeled dataset to steer the learning process on unlabeled data, striking a favorable balance between training performance and annotation costs. Nevertheless, these existing methods remain constrained by a substantial data-efficiency bottleneck, primarily because they depend on coarse performance heuristics. This reliance results in the underutilization of the majority of valuable data instances.

To overcome this limitation, we introduce GeoMin, a novel approach that models global feature distributions derived from labeled data. By decoding the structural differences between correct and incorrect rollouts, GeoMin establishes a robust prior for evaluating the reliability of self-reward signals, thereby maximizing the utility of unlabeled data. Our empirical results demonstrate that GeoMin exceeds the performance of the strongest baselines by 4.1%. Notably, it achieves superior results to fully supervised models while utilizing only 10% of the required annotations, highlighting its exceptional data efficiency.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC