Two-Action Apple Tasting with Switching Costs
Title: Minimizing Regret in Two-Action Apple-Tasting with Switching Costs
Abstract:
This paper investigates the two-action apple-tasting scenario involving switching costs, analyzed against an oblivious adversary. We present an equivalent normalized framework where, in each round, the learner must select between a revealing action and a blind action. Choosing the revealing action yields a reward of $0$ but exposes the hidden value $x_t\in[-1,1]$ associated with the blind action. Conversely, selecting the blind action provides a reward equal to $x_t$ while offering no information about the hidden value. The learner incurs a cost of one unit every time they switch between actions, and performance is evaluated via regret relative to the optimal fixed action chosen in hindsight.
While general algorithms for feedback graphs with switching costs typically provide $\widetilde O(T^{2/3})$ regret bounds for this specific problem, the two-action apple-tasting graph was previously considered a prime candidate for establishing a missing $\Omega(T^{2/3})$ lower bound. Such a lower bound would have extended to a broad class of yet-to-be-classified feedback graphs. However, we demonstrate that this obstruction does not exist. Specifically, we establish that the oblivious minimax expected regret, $R_T^\star$, for this problem is bounded as follows:
[ \frac{1}{2\sqrt3}\cdot\sqrt T \le R_T^\star \le 2\sqrt{3}\cdot \sqrt{T}. ]
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



