arXiv

BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization

Title: BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization

Abstract

Addressing social bias in Large Language Models (LLMs) introduces a unique alignment dilemma. Because bias does not possess a single, objective ground truth, the resulting reward landscape is characterized by high variance and subjectivity, distinguishing it from verifiable tasks. Existing preference-based fine-tuning techniques face significant compromises: Direct Preference Optimization (DPO) suffers from constrained exploration due to its reliance on offline training, whereas Proximal Policy Optimization (PPO) often encounters training instability stemming from potentially inaccurate critic estimates.

To resolve these issues, we introduce BiasGRPO, a novel framework that leverages Group Relative Policy Optimization (GRPO) to enhance alignment stability. This method stabilizes the process by normalizing rewards across a set of sampled completions. By replacing the traditional value function with a group-relative baseline, our approach curbs instability while preserving the exploratory advantages inherent to online training. Our evaluation demonstrates that BiasGRPO surpasses both DPO and PPO across various benchmarks, confirming its efficacy.

Furthermore, we adapted GRPO by synthetically expanding a dataset to cover diverse domains and contexts. Additionally, we developed and publicly released a custom bias reward model. This model efficiently guides generation with minimal computational cost, prevents knowledge degradation, and offers a valuable asset for seamless integration into multi-objective RLHF pipelines.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.

Cerebras Says It’s Working With All AI Gear Makers Except Nvidia
Bloomberg

Cerebras Says It’s Working With All AI Gear Makers Except Nvidia

Cerebras confirmed partnerships with all major AI hardware vendors except Nvidia. This broad engagement positions Cerebr...

Putin Turns Russia’s AI Future Into a Kremlin Family Business
Bloomberg

Putin Turns Russia’s AI Future Into a Kremlin Family Business

Putin is consolidating Russia’s AI ambitions into a Kremlin family business, effectively turning the sector into a dynas...

Reuters

Meta repeatedly pushes back new AI model release for developers, WSJ says

Meta has repeatedly delayed the release of its new AI model for developers, according to the WSJ. This ongoing postponem...