arXiv

Recovering Diversity Without Losing Alignment: A DPO Recipe for Post-Trained LLMs

Title: Restoring Diversity in Post-Trained LLMs Without Compromising Alignment: A DPO-Based Approach

Abstract: While many open-ended prompts allow for multiple correct answers that would be beneficial for users to see, the post-training process frequently restricts a Large Language Model’s (LLM) output to a narrow range of canonical responses. To address this, we present REDIPO, an offline data-construction pipeline based on Direct Preference Optimization (DPO) designed to recover distinct, valid answer modes while maintaining the alignment advantages of the instructed model. The REDIPO methodology involves sampling responses from both the base and instruct models for each prompt, rewriting the base model’s outputs using the instruct model, and filtering these candidates for safety and instruction-following quality. Subsequently, it constructs preference pairs that prioritize marginally diverse responses among candidates that exhibit similar instruction-following rewards.

Evaluations across Qwen3-4B, OLMo-3-7B, and LLaMA-3.1-8B demonstrate that REDIPO increases NoveltyBench distinct_k scores by 134%, 33%, and 44%, respectively, compared to the instruct checkpoints. In contrast, DivPO resulted in diversity changes of 0%, -6%, and -4% on these same models. The performance improvements achieved by REDIPO largely preserve results on MTBench, IFEval, and Arena-Hard benchmarks, while also lowering the direct-category attack success rate on HarmBench. Ablation studies indicate that the diversity enhancements are primarily driven by the selection of marginally diverse pairs and the rewriting of base responses, whereas filtering and quality-bounded pairing are crucial for sustaining alignment. These findings suggest that carefully curated preference data can reintroduce diverse, valid answers derived from base-model generations without sacrificing the alignment benefits of post-training. We have made our code and data available at https://github.com/vsamuel2003/ReDiPO.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

TechCrunch

Ramp raises $750M at $44B valuation as investors hunger for fintechs with an AI story

Ramp secured $750M at a $44B valuation, driven by AI integration and $1.5B+ revenue. The fintech firm now serves 70,000 ...

TechCrunch

Is Silicon Valley ready to put robots in people’s homes? Hello Robot is.

Hello Robot’s Stretch avoids Silicon Valley hype, focusing on practical home deployment to gather essential real-world d...

Canada to Provide Funding, Buy Equity Stakes in AI Startups
Bloomberg

Canada to Provide Funding, Buy Equity Stakes in AI Startups

Canada will fund and buy equity stakes in AI startups to boost the sector. This investment aims to strengthen the nation...

TechCrunch

Chinese spies are using LinkedIn to lure Westerners into sharing sensitive information

A joint Western security alert warns that Chinese spies use LinkedIn to impersonate recruiters and extract sensitive dat...

Peter Thiel’s Family Office Pays Record Rent for Top Miami Tower
Bloomberg

Peter Thiel’s Family Office Pays Record Rent for Top Miami Tower

Peter Thiel’s family office set a record rent for a Miami tower lease. This deal establishes a new benchmark for the cit...

Who’s Excited for SpaceX’s I.P.O.? Space Nerds.
New York Times

Who’s Excited for SpaceX’s I.P.O.? Space Nerds.

Space enthusiasts are the most eager for SpaceX’s IPO, driven by their passion for space exploration.