arXiv

GroupTravelBench: Benchmarking LLM Agents on Multi-Person Travel Planning

Title: GroupTravelBench: Benchmarking LLM Agents on Multi-Person Travel Planning

Abstract:

While real-world travel planning is predominantly a collective endeavor, current benchmarks for LLMs in this domain largely simplify the scenario to a single user, a trend that has pushed the field toward saturation. This single-user paradigm overlooks the core complexities of group coordination, specifically the challenges of uncovering individual preferences, identifying interpersonal conflicts, and balancing overall utility with fairness. To align benchmarks with the multi-user nature of actual travel planning, we present GroupTravelBench, the inaugural benchmark designed for multi-user, multi-turn travel scenarios.

Constructed from authentic user profiles, Point of Interest (POI) data, and ticket pricing information, the benchmark features 650 tasks distributed across three distinct difficulty tiers. These tasks are executed within a synchronous group-chat sandbox that utilizes cached tool data to ensure reproducible, offline evaluation. While single-user benchmarks typically assess multi-step reasoning and tool utilization, GroupTravelBench extends this scope to evaluate three specific group-oriented competencies: (i) elicitation, which involves drawing out private preferences through iterative dialogue; (ii) coordination, which requires managing inter-user conflicts through compromise or subgroup formation; and (iii) planning, which focuses on optimizing group utility while maintaining fairness.

We complement this benchmark with an evaluation framework that integrates rule-based outcome metrics with LLM-judge process metrics. Our testing of various frontier models reveals that even the most advanced agents struggle to meet all four rule-based outcome metrics, achieving a plan validity rate of less than 12%. This significant gap indicates that ensuring high-quality outcomes at the group level remains a critical, unsolved challenge for LLM travel-planning agents.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs
Bloomberg

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs

China’s robotaxi expansion highlights the policy tension between driving economic growth through AI and protecting emplo...

Exams watchdog warns of rise in high-tech cheating
BBC News

Exams watchdog warns of rise in high-tech cheating

Ofqual warns of rising high-tech cheating, with smart devices involved in 44% of misconduct cases. Invigilators are trai...

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom
Bloomberg

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom

Thailand’s wealthiest individual is investing $4.3 billion in expansion, capitalizing on the booming artificial intellig...

Reuters

Amazon unveils new AI warehouse robot in $12 billion Europe push

Amazon unveiled a new AI warehouse robot, marking a key step in its $12 billion European expansion strategy to enhance l...

US Tech Sector Announces Most Job Cuts in Nearly Two Years
Bloomberg

US Tech Sector Announces Most Job Cuts in Nearly Two Years

The US tech sector recorded its highest wave of layoffs in nearly two years, signaling a significant downturn for the in...

Iran Says No Progress in US Talks | The Opening Trade 6/4/2026
Bloomberg

Iran Says No Progress in US Talks | The Opening Trade 6/4/2026

Iran reports no progress in US talks on June 4, 2026. The Opening Trade highlights the ongoing diplomatic impasse betwee...