Quantifying and Mitigating Self-Preference Bias of LLM Judges
Title: Measuring and Reducing Self-Preference Bias in LLM Judges
Abstract:
The "LLM-as-a-Judge" paradigm has emerged as a cornerstone of automated evaluation systems, facilitating essential tasks such as model alignment, leaderboard ranking, and quality assurance. Nevertheless, the reliability and scalability of this method are significantly compromised by Self-Preference Bias (SPB). SPB represents a directional evaluative skew wherein large language models systematically exhibit a preference for, or aversion to, their own generated outputs during the assessment process. Current methods for measuring this bias are hindered by their dependence on expensive human annotations and their tendency to confuse generative proficiency with evaluative stance, rendering them unsuitable for widespread, real-world implementation.
To overcome these limitations, we present a completely automated framework designed to quantify and mitigate SPB. This approach generates pairs of responses with comparable quality levels and minimal performance disparities, allowing for the statistical separation of discriminative ability from bias propensity without the need for human-curated ground truth. Our empirical investigation, spanning 20 prominent LLMs, indicates that higher model capabilities are frequently uncorrelated with, or even negatively correlated to, lower instances of SPB. To counteract this bias, we introduce a structured, multi-dimensional evaluation strategy based on cognitive load decomposition. This method successfully reduces SPB by an average of 31.5%.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



