arXiv

Tree-Guided Identify-Then-Exploit: A Unified Framework of Best Arm Identification and Regret Minimization for Dueling Bandits

June 2, 2026 · Pu Wang, Yao-Xiang Ding · Original Source

Title: Tree-Guided Identify-Then-Exploit: A Unified Framework for Best Arm Identification and Regret Minimization in Dueling Bandits

Abstract:

This paper investigates $N$-armed stochastic dueling bandits operating under the Condorcet-winner assumption, focusing on three prevalent objectives: best-arm identification (BAI), weak regret, and strong regret. To the best of our knowledge, we introduce Tree-Guided Identify-Then-Exploit (TG-ITE), the inaugural unified framework capable of addressing all three goals simultaneously. By avoiding the need for stricter assumptions, we introduce a common tree-guided identification mechanism designed to locate a high-confidence incumbent in $O(N)$ comparisons. Furthermore, we develop distinct exploitation strategies that leverage this warm-start phase to optimize for the specific objective at hand.

This methodology yields four key advantages: (1) it attains $O(N)$ sample complexity for BAI without relying on the stronger assumptions typically required; (2) it establishes the first winner-stays-style algorithm capable of achieving $O(N)$ weak regret; (3) it matches the $O(N \log T)$ performance guarantee characteristic of specialized strong-regret methods; and (4) it enables the joint optimization of BAI and weak regret, providing $O(N)$ guarantees for both and thereby closing the $O(\log N)$ sub-optimality gap present in prior approaches. These findings suggest that the trade-off between BAI and regret minimization is comparatively mild within the context of dueling bandits.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC