arXiv

Coherence Maximization Improves Pluralistic Alignment

June 3, 2026 · Taslim Mahbub, Yiding Pei, Shi Feng · Original Source

Title: Enhancing Pluralistic Alignment Through Coherence Maximization

Original: arXiv:2606.03110v1 Announce Type: new Abstract: Aligning AI systems with diverse human values requires value specifications grounded in concrete examples, but generating such examples without extensive human supervision remains an open challenge. We investigate what makes these examples effective, using Internal Coherence Maximization (ICM) -- which infers labels by maximizing their mutual predictability -- to generate persona-specific examples that steer a model toward a target group's values, without human supervision. Across four benchmarks spanning classification, preference, and open-ended generation, ICM-inferred in-context examples match the performance of gold labels. Crucially, coherence matters beyond individual label accuracy: with accuracy held constant, more coherent examples generalize substantially better than incoherent ones. For personas underrepresented in pretraining data, targeted human feedback on the questions where the model is least certain about a persona's values yields better generalization than the same number of labels on arbitrary questions. These results identify coherence as a key design principle for scalable value specification, leveraging the diverse human perspectives already encoded in pretrained language models.

Rewritten: Title: Boosting Pluralistic Alignment via Coherence Maximization

Source: arXiv:2606.03110v1 | Status: New Announcement

Abstract: To align artificial intelligence with a broad spectrum of human values, it is essential to define value specifications through concrete examples. However, creating these examples without relying on heavy human oversight remains a significant hurdle. This study explores the factors that contribute to the effectiveness of such examples by employing Internal Coherence Maximization (ICM). ICM operates by inferring labels based on their mutual predictability, thereby generating persona-specific instances that guide models toward the values of specific groups, all without requiring human supervision.

Our evaluation across four distinct benchmarks—covering classification, preference tasks, and open-ended generation—demonstrates that in-context examples inferred via ICM perform on par with gold-standard labels. Importantly, the research highlights that coherence offers benefits extending beyond mere label accuracy. Even when accuracy is kept constant, examples exhibiting higher coherence demonstrate significantly superior generalization capabilities compared to their incoherent counterparts.

Furthermore, for personas that are underrepresented in pretraining datasets, directing human feedback toward questions where the model exhibits the lowest confidence regarding a persona's values proves more effective for generalization than applying the same volume of labels to random questions. Ultimately, these findings position coherence as a fundamental design principle for scalable value specification, capitalizing on the varied human perspectives inherently present in pretrained language models.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC