Decomposing how prompting steers behavior
Title: Unpacking the Mechanisms of Prompt-Driven Behavioral Control
Abstract:
While prompting effectively directs the behavior of large language models (LLMs) and vision-language models (VLMs) without requiring weight adjustments, the precise manner in which instructional changes alter internal representations to generate these behaviors remains poorly understood. To address this, we propose a nested geometric decomposition framework that conceptualizes prompting as a transformation of the representational geometry associated with the content following the prompt.
For each pair of prompts, we align the representations of identical stimuli under different instructions using a hierarchy of increasingly complex stimulus-invariant maps: translation, rigid transformation with uniform scaling, sequential axis scaling, affine transformation, and nonlinear transformation. We subsequently evaluate the causal impact of each map by substituting a single layer’s hidden state (derived from prompt A) for held-out stimuli with its mapped equivalent, thereby measuring the extent to which the representational geometry and behavior of prompt B are recovered.
Our analysis, conducted across three LLMs, three VLMs, and six datasets covering text and images (encompassing style, emotion, scene content, and numerical data), demonstrates that prompts consistently reshape internal representations to align with the structure of the instructed task. Variance decomposition via cross-validation indicates that a significant portion of the activation changes induced by prompts is explained by shape-preserving maps, particularly translation and rigid transformations with uniform scaling. Furthermore, tier profiles expose routing strategies that vary by model and task across different layers.
Notably, while translation and rigid tiers enhance behavioral agreement, the affine transformation tier is the first to nearly restore the target prompt’s task geometry, resulting in corresponding improvements in behavior. This finding implies that cross-dimensional linear mixing serves as a primary mechanism through which prompts reorganize representations to fit instructed task structures. Ultimately, our framework breaks down prompt-induced representational shifts into interpretable geometric components, elucidating how models route task-relevant information to execute prompt-driven actions.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



