arXiv

Decomposing and Measuring Evaluation Awareness

Title: Dissecting and Quantifying Evaluation Awareness

Abstract:

Frontier language models occasionally detect that they are being assessed, leading to behavioral adjustments that compromise the validity of benchmark outcomes. However, current research lacks a unified framework, often conflating model traits with evaluation characteristics and confusing detection with the subsequent behavioral response. This study anchors the concept of evaluation awareness in social psychology, breaking it down into two distinct parts: an environmental aspect (defined by how identifiable the task is) and a model aspect (which distinguishes between recognizing the evaluation and the likelihood of acting on that recognition). We quantify the environmental component by identifying eight specific trigger factors, including the use of placeholder entities and grading-oriented output structures, while monitoring recognition and behavior via chain-of-thought analysis. Our experiments across four benchmarks and nine frontier models reveal that recognition rates are determined by the unique combination of model and benchmark, rather than by either element alone. Although recognition seldom results in behavioral shifts, when changes do occur, their direction is influenced by the perceived nature of the evaluation. Furthermore, models exhibit heightened sensitivity to safety assessments compared to capability tests, thereby posing a greater threat to the reliability of safety benchmarks. To investigate the specific factors to which each model responds and how these factors interact, we introduce EvalAwareBench. This controlled benchmark consists of 100 paired safety and capability tasks, allowing each of the eight factors to be toggled independently while keeping the core request constant. Our findings indicate that no single factor impacts all models uniformly; however, combining multiple factors consistently increases evaluation awareness across the board. Our proposed framework and EvalAwareBench offer essential tools for measuring, attributing, and mitigating evaluation awareness, suggesting that maintaining behavioral consistency despite recognition is a viable direction for future research.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...