GroupTravelBench: Benchmarking LLM Agents on Multi-Person Travel Planning
Title: GroupTravelBench: Benchmarking LLM Agents on Multi-Person Travel Planning
Abstract:
While real-world travel planning is predominantly a collective endeavor, current benchmarks for LLMs in this domain largely simplify the scenario to a single user, a trend that has pushed the field toward saturation. This single-user paradigm overlooks the core complexities of group coordination, specifically the challenges of uncovering individual preferences, identifying interpersonal conflicts, and balancing overall utility with fairness. To align benchmarks with the multi-user nature of actual travel planning, we present GroupTravelBench, the inaugural benchmark designed for multi-user, multi-turn travel scenarios.
Constructed from authentic user profiles, Point of Interest (POI) data, and ticket pricing information, the benchmark features 650 tasks distributed across three distinct difficulty tiers. These tasks are executed within a synchronous group-chat sandbox that utilizes cached tool data to ensure reproducible, offline evaluation. While single-user benchmarks typically assess multi-step reasoning and tool utilization, GroupTravelBench extends this scope to evaluate three specific group-oriented competencies: (i) elicitation, which involves drawing out private preferences through iterative dialogue; (ii) coordination, which requires managing inter-user conflicts through compromise or subgroup formation; and (iii) planning, which focuses on optimizing group utility while maintaining fairness.
We complement this benchmark with an evaluation framework that integrates rule-based outcome metrics with LLM-judge process metrics. Our testing of various frontier models reveals that even the most advanced agents struggle to meet all four rule-based outcome metrics, achieving a plan validity rate of less than 12%. This significant gap indicates that ensuring high-quality outcomes at the group level remains a critical, unsolved challenge for LLM travel-planning agents.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC





