arXiv

Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning

Title: Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning

Abstract: Success in competitive STEM exams like JEE and NEET hinges on a student’s ability to execute multi-step symbolic reasoning, perform precise numerical calculations, and demonstrate a profound grasp of concepts spanning physics, chemistry, and mathematics. While recent large language models excel on standard reasoning benchmarks, their deployment at scale remains challenging, particularly when addressing the millions of domain-specific, structurally consistent student queries. To address this, we present Aryabhata 2, a reasoning-oriented language model tailored for competitive STEM assessments. This model undergoes reinforcement-learning post-training, leveraging high-quality training curricula derived from PhysicsWallah’s internal question banks. Specifically, we apply reinforcement learning with verifiable rewards to fine-tune the GPT-OSS-20B base model. Our training methodology integrates extended reinforcement learning phases with enhanced exploration, facilitated by progressively increasing rollout group sizes. We assess Aryabhata 2’s performance on competitive exam benchmarks, including JEE Main, JEE Advanced, and NEET, alongside out-of-distribution reasoning datasets such as AIME, HMMT, MMLU-Pro, MMLU-Redux 2.0, and GPQA. The results indicate that Aryabhata 2 surpasses its base model, GPT-OSS-20B, in competitive STEM reasoning tasks while significantly reducing computational overhead, requiring up to 64% fewer output tokens.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Dimon and SpaceX Executives to Pitch IPO to Clients
Bloomberg

Dimon and SpaceX Executives to Pitch IPO to Clients

JPMorgan Chase CEO Jamie Dimon and SpaceX executives are pitching IPO details to clients.

Financial Times

Europe is finally flexing its innovation muscles

The EU’s new tech sovereignty package signals a positive shift from defensive regulation to proactive innovation, markin...

Apollo’s Zelter Expects High-Grade Debt Sales to Top US Treasuries
Bloomberg

Apollo’s Zelter Expects High-Grade Debt Sales to Top US Treasuries

Apollo’s Zelter expects high-grade debt sales to surpass US Treasuries. He anticipates investment-grade debt outperformi...

EU Insurance Watchdog Warns on Loan Risks
Bloomberg

EU Insurance Watchdog Warns on Loan Risks

EIOPA warns insurers to closely monitor loan risks, though initial reports lack specific details on the nature or scope ...

Glazer Family Members Said to Study Manchester United Stake Sale
Bloomberg

Glazer Family Members Said to Study Manchester United Stake Sale

Reports indicate the Glazer family is evaluating a potential sale of their Manchester United stake, with family members ...

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines
Bloomberg

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines

Ares’ Blair Jacobson argues that private credit headlines misrepresent reality, highlighting a disconnect between media ...