RoleCDE:Benchmarking and Mitigating Role-Alignment Trade-offs in Role-Playing Agents
Title: RoleCDE: Benchmarking and Mitigating Role-Alignment Trade-offs in Role-Playing Agents
Abstract:
While role-playing agents (RPAs) are commonly employed to guide large language models (LLMs) toward behavior that remains consistent with their assigned personas, current evaluation methods primarily focus on surface-level accuracy. Consequently, these benchmarks provide insufficient insight into how agents make decisions when role-specific values clash with alignment constraints. To bridge this gap, we present RoleCDE, a novel benchmark specifically engineered to assess RPAs in situations where role-specific values and alignment-oriented constraints are in structured conflict.
RoleCDE frames role-aware decision-making as a series of cognitive dilemma scenarios. It simultaneously assesses an agent’s ability to ground itself in the specific role and scenario, resolve value conflicts, and demonstrate consistent decision tendencies. The dataset is extensive, comprising roughly 8,000 distinct role profiles and scenarios, which generate nearly 24,000 dilemma instances. These instances are categorized into eight role types and distributed across three levels of difficulty.
Our evaluation of several mainstream LLMs uncovered a phenomenon termed "Role Value Decoupling." We observed that agents consistently prioritize alignment and moral consistency over role-specific values whenever these two sets of values conflict, even when explicitly conditioned to adhere to a specific role. This tendency remains largely unaffected by the difficulty of the dilemma but shows significant variation depending on the role category. Furthermore, we demonstrate that fine-tuning models using RoleCDE data effectively reduces this decoupling by enhancing reasoning capabilities regarding value trade-offs. Crucially, this approach maintains both general reasoning performance and fidelity to general role-playing tasks.
The code for this project is publicly accessible at: https://github.com/rabbitrose/RoleCDE.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




