arXiv

Do Gender Cues Affect LLM Value Trade-offs? Evidence from a Controlled Decision Benchmark

Title: Investigating the Impact of Gender Signals on LLM Value Trade-offs: Insights from a Controlled Decision Benchmark

Abstract:

As large language models (LLMs) become more prevalent in decision-making contexts sensitive to values, it is critical that irrelevant demographic indicators do not skew judgments. To address this, we introduce the Realistic Value Decision Benchmark (RVDB). This controlled framework isolates the impact of gender by varying only the role-gender configuration, while keeping the scenario, ordered value pairs, roles, candidate decisions, Value Distance, and Decision Severity constant. Through a position-balanced evaluation involving seven distinct models, we examined whether models maintain decision invariance when subjected to gender perturbations and whether their self-reported explanations align with their actual behavioral shifts.

Our findings indicate that explicit gender cues trigger systematic, albeit bounded, decision flips. This effect persists even when models are explicitly prompted to assess whether gender influenced their choice. We observed a consistent asymmetry in cross-gender role swaps, particularly regarding female-proposed decisions. Notably, when decisions were flipped, models frequently attributed the change to "No Influence" or other non-gender factors, thereby obscuring the true cause.

Further analysis reveals that gender-related effects are most pronounced near indeterminate value boundaries and within high-severity decision contexts. This suggests that gender cues function as local factors that shift boundaries, rather than globally overriding value-based reasoning. While overall value rankings remained largely stable, the trade-offs between ordered value pairs shifted unevenly depending on the role-gender configuration. These results demonstrate that gender can behaviorally influence LLM value trade-offs even when this influence is hidden in self-attribution, highlighting the need for controlled behavioral audits that go beyond explanation-based evaluations.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...