arXiv

PoliticsBench: Benchmarking Political Values in Large Language Models with Multi-Turn Roleplay

Title: PoliticsBench: Benchmarking Political Values in Large Language Models with Multi-Turn Roleplay

Abstract

As Large Language Models (LLMs) become primary information sources, concerns regarding their potential political bias and its effect on objectivity have grown. Current benchmarks for social bias in LLMs largely focus on demographic stereotypes, and when political bias is assessed, it is typically done at a broad level, missing the underlying values that drive sociopolitical reasoning. To address this gap, we present PoliticsBench, a multi-stage roleplay benchmark designed to evaluate fine-grained value expression in LLMs.

In our study, models navigated twenty dynamic scenarios, articulating tradeoffs, adopting positions, and making decisions amidst competing pressures. Testing eight prominent LLMs revealed that scenario-based prompting generates broader and more pronounced value profiles compared to direct political questioning. Specifically, peak interaction stages saw an increase of approximately 0.75 in the number of strongly activated value dimensions (out of a total of 10), a statistically significant rise compared to baseline prompting ($p < 0.05$).

Furthermore, we observed that commitment to a chosen stance intensified throughout the interaction, climbing by roughly 1.4 points on a $[0,5]$ scale from the initial to the decision stages. Although responses in later interaction phases became less robust to paraphrasing of the scenarios, inter-judge agreement remained relatively consistent. These findings indicate that assessing LLM political behavior necessitates a shift from static prompts to extended interactive environments that reflect how values are applied within specific contexts.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Planet Labs Raises Outlook as War Drives Earth-Imaging Demand
Bloomberg

Planet Labs Raises Outlook as War Drives Earth-Imaging Demand

Planet Labs raised its financial forecast as geopolitical conflicts drive surging demand for high-resolution satellite i...

TechCrunch

Startup Battlefield is returning to Australia — here’s what happened the last time we came to Sydney

Startup Battlefield returns to Sydney on August 19, 2026, partnering with Stripe. Ten finalists pitch for $10,000 in cre...

IBM, AT&amp;T Accused by Whistleblower of Covering Up Foreign Hacks
Bloomberg

IBM, AT&amp;T Accused by Whistleblower of Covering Up Foreign Hacks

A whistleblower alleges IBM and AT&T concealed foreign cyberattacks. This claim contrasts with unrelated news about Micr...

Verizon CEO Sees AI Coming for Customer Service Jobs
Bloomberg

Verizon CEO Sees AI Coming for Customer Service Jobs

Verizon’s CEO predicts AI will disrupt customer service jobs, as automation reshapes support operations and alters tradi...

Verizon CEO Sees AI Replacing Large Share of Customer Service
Bloomberg

Verizon CEO Sees AI Replacing Large Share of Customer Service

Verizon CEO Dan Schulman predicts AI will replace a large share of customer service roles. This outlook was shared at th...

Android's Samat on Integrating AI into the Ecosystem
Bloomberg

Android's Samat on Integrating AI into the Ecosystem

Samat discusses integrating AI into the Android ecosystem. The source text is missing, so no specific details can be sum...