arXiv

Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning

June 4, 2026 · Ritvik Rastogi, Vishal Singh, Tejas Chaudhari, Sandeep Varma · Original Source

Title: Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning

Abstract: Success in competitive STEM exams like JEE and NEET hinges on a student’s ability to execute multi-step symbolic reasoning, perform precise numerical calculations, and demonstrate a profound grasp of concepts spanning physics, chemistry, and mathematics. While recent large language models excel on standard reasoning benchmarks, their deployment at scale remains challenging, particularly when addressing the millions of domain-specific, structurally consistent student queries. To address this, we present Aryabhata 2, a reasoning-oriented language model tailored for competitive STEM assessments. This model undergoes reinforcement-learning post-training, leveraging high-quality training curricula derived from PhysicsWallah’s internal question banks. Specifically, we apply reinforcement learning with verifiable rewards to fine-tune the GPT-OSS-20B base model. Our training methodology integrates extended reinforcement learning phases with enhanced exploration, facilitated by progressively increasing rollout group sizes. We assess Aryabhata 2’s performance on competitive exam benchmarks, including JEE Main, JEE Advanced, and NEET, alongside out-of-distribution reasoning datasets such as AIME, HMMT, MMLU-Pro, MMLU-Redux 2.0, and GPQA. The results indicate that Aryabhata 2 surpasses its base model, GPT-OSS-20B, in competitive STEM reasoning tasks while significantly reducing computational overhead, requiring up to 64% fewer output tokens.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC