Mean-based algorithms: A lower bound and regret
Title: Mean-Based Algorithms: Establishing Lower Bounds and Regret Analysis
Abstract
Mean-based algorithms constitute a category of online learning methods that prioritize actions based on their historical average rewards, typically assigning lower selection probabilities to those with inferior performance. While recent literature suggests that these algorithms effectively converge toward serially undominated actions—serving as approximations for Nash equilibria in economic contexts—empirical evidence indicates they may exhibit slower convergence rates than established methods in bandit-feedback environments.
This study investigates mean-based algorithms under conditions where the time horizon is unknown and only bandit feedback is accessible. We present the inaugural lower bound concerning the algorithm-defining sequence $\gamma_t$, which rigorously defines the theoretical limit on the learning speed of such algorithms. Furthermore, we introduce two novel mean-based algorithms: one that serves as a generalization of $\epsilon$-greedy strategies, and another that extends the mean-based Exp3 framework to accommodate unknown time horizons.
Our experimental results demonstrate that while mean-based algorithms may be marginally slower, they remain competitive with other bandit-feedback approaches. Additionally, we explore the connection between mean-based methods and no-regret algorithms. We reveal that the intersection of these two classes is non-trivial and dependent on the selection of $\gamma_t$, proving the existence of algorithms that satisfy both the mean-based and no-regret criteria. These findings provide deeper insight into the "exploitability" of this algorithmic class, building upon insights from previous research.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC






