Concept Heterogeneity-aware Representation Steering
Title: Concept Heterogeneity-aware Representation Steering
Abstract:
Representation steering provides a lightweight mechanism for modulating the behavior of large language models (LLMs) by intervening on internal activations during inference. Current methodologies predominantly depend on a single, global steering direction, usually derived through a difference-in-means calculation applied to contrastive datasets. This strategy rests on the implicit assumption that the targeted concept is uniformly represented throughout the embedding space. However, in real-world scenarios, LLM representations often display significant non-homogeneity, characterized by clustered and context-dependent structures that make global steering directions fragile.
In this study, we analyze representation steering using the framework of optimal transport (OT). We observe that conventional difference-in-means steering effectively functions as the OT map between two identical distributions distinguished only by their first moments, resulting in a global translation. To address this restrictive premise, we theoretically represent source and target embeddings as Gaussian mixture models and define the steering task as a discrete OT problem between semantic latent clusters. By applying barycentric projection to the resulting transport plan, we construct an explicit, input-dependent steering map. This approach generates a smooth, kernel-weighted aggregation of cluster-level shifts. We name this methodology Concept Heterogeneity-aware Representation Steering (CHaRS). Across a wide range of experimental configurations, our results demonstrate that CHaRS delivers superior behavioral control compared to traditional global steering techniques.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC






