arXiv

LLM Judges Inconsistently Disagree Across Safety Criteria and Harm Categories

Title: Large Language Model Evaluators Show Significant Disagreement Regarding Safety Standards and Harm Types

Abstract: This study assesses the reliability of automated evaluators when performing multi-dimensional safety assessments in a reference-free environment. Our findings reveal that Large Language Models (LLMs) are inconsistent judges when detecting safety concerns associated with machine-generated guidance in regulated sectors like finance. However, they demonstrate greater reliability when identifying more explicit forms of harmful content, such as violence. The extent of inconsistency in a model’s evaluations fluctuates considerably depending on the specific safety criteria applied, and is also influenced by the content’s language and linguistic style. Furthermore, we observe substantial divergence among different evaluators regarding the same output, spanning various domains, safety metrics, and languages. These insights shed new light on the utilization of LLMs as evaluators and provide practical recommendations for deploying automated judges in real-world applications.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...