arXiv

When Models Refuse: Political Steerability and Feature Richness as Measures of Ideological Depth

Title: When Models Say No: Assessing Ideological Depth via Political Steerability and Feature Richness

Abstract

Large language models (LLMs) frequently decline to execute benign directives, such as adopting a specific persona or debating a particular political stance. While these refusals are typically interpreted as evidence of effective safety protocols, this study explores an alternative explanation: they may instead indicate a capability deficit, stemming from a lack of the internal representations necessary to reason from the requested viewpoint. To examine this hypothesis, we propose ideological depth, a metric composed of two elements: (i) the model’s steerability, defined as its capacity to adhere to political instructions without failure, and (ii) the feature richness of its internal political representations, quantified using sparse autoencoders (SAEs).

By analyzing two prominent openweight LLMs, we evaluated interventions involving both prompt engineering and activation steering, while probing political features through publicly accessible SAEs. Our findings reveal significant, systematic variations between the models. Specifically, the model demonstrating higher steerability across both ideological spectrums activated approximately ~7.3x more distinct political features compared to its counterpart, which predominantly responded with increased refusals. Furthermore, we causally ablated a targeted subset of political features in the more capable model; this intervention replicated the feature-poor behavior and triggered a rise in refusals, mirroring the performance of the less capable model. Collectively, these results suggest that refusals on benign prompts may stem from capability deficits rather than rigid safety constraints. Additionally, they establish ideological depth as a quantifiable property of LLMs that serves as a predictor for refusal behavior.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...