arXiv

Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization

Title: Policy Split: Encouraging Dual-Mode Exploration in LLM Reinforcement Learning via Dual-Mode Entropy Regularization

Abstract:

We introduce Policy Split, a novel framework designed to foster diverse exploration in reinforcement learning (RL) for large language models (LLMs) while maintaining high accuracy. This approach divides the policy into two distinct modes—normal and high-entropy—guided by a high-entropy prompt. Although both modes share the same underlying model parameters, they are subject to a collaborative dual-mode entropy regularization scheme aligned with their specific goals. The normal mode focuses on optimizing task correctness, whereas the high-entropy mode prioritizes exploration, allowing the two to learn in tandem. Our extensive experiments show that Policy Split consistently surpasses established entropy-guided RL baselines across different model sizes in both general and creative tasks. Further analysis indicates that this method enables dual-mode exploration, with the high-entropy mode producing behavioral patterns distinct from those of the normal mode, thereby offering unique learning signals.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.

Cerebras Says It’s Working With All AI Gear Makers Except Nvidia
Bloomberg

Cerebras Says It’s Working With All AI Gear Makers Except Nvidia

Cerebras confirmed partnerships with all major AI hardware vendors except Nvidia. This broad engagement positions Cerebr...

Putin Turns Russia’s AI Future Into a Kremlin Family Business
Bloomberg

Putin Turns Russia’s AI Future Into a Kremlin Family Business

Putin is consolidating Russia’s AI ambitions into a Kremlin family business, effectively turning the sector into a dynas...

Reuters

Meta repeatedly pushes back new AI model release for developers, WSJ says

Meta has repeatedly delayed the release of its new AI model for developers, according to the WSJ. This ongoing postponem...