arXiv

Finite-Time Regret Analysis of Retry-Aware Bandits

Title: Analyzing Regret in Retry-Aware Bandits Within Finite Time

Abstract: This paper investigates a stochastic bandit algorithm designed for retry-aware objectives, which prioritize the best result achieved across multiple trials, such as pass@$k$ and max@$k$. Operating on a posterior distribution of arm values, the ReMax method selects a sampling distribution that maximizes the posterior expected maximum reward over $M$ hypothetical draws. While this objective has previously served as an exploration strategy in reinforcement learning under uncertainty, its regret characteristics in bandit settings have not been well understood. We focus on Gaussian rewards and the initial non-trivial scenario where $M=2$. By establishing an expected-improvement balance condition, we define the optimal ReMax distribution and demonstrate the first sublinear regret bound for this approach. Our theoretical framework distinguishes between the standard saturation of suboptimal arms and a unique ReMax phenomenon: an underestimation effect where the optimal arm is sampled too infrequently following a pessimistic estimate. This dynamic clarifies why ReMax tends to be more exploitative than Thompson sampling (TS) and accounts for the technical complexity of its regret analysis. Empirical results align with this theoretical insight: ReMax generally surpasses both KL-UCB and Thompson sampling when underestimation is mild, whereas scaling the posterior variance helps alleviate the impact of significant underestimation.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...