arXiv

Coherence Maximization Improves Pluralistic Alignment

Title: Enhancing Pluralistic Alignment Through Coherence Maximization

Original: arXiv:2606.03110v1 Announce Type: new Abstract: Aligning AI systems with diverse human values requires value specifications grounded in concrete examples, but generating such examples without extensive human supervision remains an open challenge. We investigate what makes these examples effective, using Internal Coherence Maximization (ICM) -- which infers labels by maximizing their mutual predictability -- to generate persona-specific examples that steer a model toward a target group's values, without human supervision. Across four benchmarks spanning classification, preference, and open-ended generation, ICM-inferred in-context examples match the performance of gold labels. Crucially, coherence matters beyond individual label accuracy: with accuracy held constant, more coherent examples generalize substantially better than incoherent ones. For personas underrepresented in pretraining data, targeted human feedback on the questions where the model is least certain about a persona's values yields better generalization than the same number of labels on arbitrary questions. These results identify coherence as a key design principle for scalable value specification, leveraging the diverse human perspectives already encoded in pretrained language models.

Rewritten: Title: Boosting Pluralistic Alignment via Coherence Maximization

Source: arXiv:2606.03110v1 | Status: New Announcement

Abstract: To align artificial intelligence with a broad spectrum of human values, it is essential to define value specifications through concrete examples. However, creating these examples without relying on heavy human oversight remains a significant hurdle. This study explores the factors that contribute to the effectiveness of such examples by employing Internal Coherence Maximization (ICM). ICM operates by inferring labels based on their mutual predictability, thereby generating persona-specific instances that guide models toward the values of specific groups, all without requiring human supervision.

Our evaluation across four distinct benchmarks—covering classification, preference tasks, and open-ended generation—demonstrates that in-context examples inferred via ICM perform on par with gold-standard labels. Importantly, the research highlights that coherence offers benefits extending beyond mere label accuracy. Even when accuracy is kept constant, examples exhibiting higher coherence demonstrate significantly superior generalization capabilities compared to their incoherent counterparts.

Furthermore, for personas that are underrepresented in pretraining datasets, directing human feedback toward questions where the model exhibits the lowest confidence regarding a persona's values proves more effective for generalization than applying the same volume of labels to random questions. Ultimately, these findings position coherence as a fundamental design principle for scalable value specification, capitalizing on the varied human perspectives inherently present in pretrained language models.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...