arXiv

Offline-to-Online Learning in Linear Bandits

June 4, 2026 · Kushagra Chandak, Toshinori Kitamura, Xiaoqi Tan · Original Source

Title: Bridging Offline and Online Learning in Linear Bandit Frameworks

Abstract: This paper investigates the challenge of online learning augmented by a pre-existing offline dataset within the context of stochastic linear bandits. While such scenarios are common in real-world applications, the nuanced trade-off between offline and online learning strategies in structured settings has not been thoroughly explored. To address this, we introduce a novel linear bandit algorithm designed to navigate this balance effectively. The proposed method leverages offline data during the initial phases of interaction, progressively shifting its focus toward exploration as the time horizon extends. We derive regret bounds that prove our approach performs competitively against both purely online and purely offline baselines. Specifically, the algorithm ensures sublinear regret with respect to the optimal action as the volume of online interactions increases, while its regret concerning an offline reference metric diminishes as the quantity of offline samples expands. Our empirical evaluations confirm the robustness and efficacy of this method across a diverse range of problem parameters.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC