arXiv

PieArena: Ranking and Profiling Language Agents in Realistic Negotiation Scenarios

June 3, 2026 · Chris Zhu, Sasha Cui, Will Sanok Dufallo, Runzhi Jin, Zhen Xu, Linjun Zhang, Daylian Cain · Original Source

Title: PieArena: Ranking and Profiling Language Agents in Realistic Negotiation Scenarios

Abstract:

This study offers a comprehensive assessment of Large Language Models’ (LLMs) negotiation capabilities, a critical business function that demands strategic reasoning, theory of mind, and the ability to generate economic value. To facilitate this analysis, we introduce PieArena, a large-scale benchmark for negotiation that relies on multi-agent interactions within realistic scenarios derived from MBA negotiation curricula at a prestigious business school. Our evaluation framework encompasses three distinct pairing regimes: mirror-play, cross-play, and human-LM interactions.

We have developed a ranking model designed for continuous negotiation payoffs. This model generates order-invariant leaderboards with quantified uncertainty, while simultaneously addressing systematic experimental asymmetries. Additionally, we investigate the impact of joint-intentionality agentic scaffolding, observing asymmetric benefits: significant performance boosts for mid- and lower-tier LMs, contrasted with diminishing returns for frontier models.

Using trained business school students as calibration anchors, we gathered human-human and human-LM negotiation data. Our findings indicate that a representative frontier language agent, GPT-5, performs on par with or better than this human baseline within our evaluation parameters. Beyond merely reporting deal outcomes, PieArena delivers a multi-dimensional behavioral profile. This profile exposes cross-model heterogeneity in areas such as instruction compliance, computational accuracy, and judge-assessed metrics of deception and reputation, thereby demonstrating the utility of evaluation methods that extend beyond outcome-centric leaderboards.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC