A Direct Approach for Handling Contextual Bandits with Latent State Dynamics
Title: A Direct Approach for Handling Contextual Bandits with Latent State Dynamics
Abstract: This study examines a linear contextual bandit framework in which both the contexts and the associated rewards are controlled by a finite hidden Markov chain (HMM). Initially, we re-examine the streamlined model proposed by Nelson et al. (2022), where rewards are defined as linear functions of the posterior probabilities—referred to as beliefs—over the hidden states conditional on the observed contexts, rather than the hidden states themselves. This simplified variant can be addressed via a direct reduction to conventional linear contextual bandits. We expand the theoretical analysis of this reduction by incorporating the estimation of HMM parameters into the regret bound. Furthermore, we establish high-probability bounds that are independent of the reward functions, relying exclusively on the model’s characteristics through the estimation of HMM parameters. Second, and most significantly, we investigate a more natural but complex model that includes direct dependencies within the hidden states, in addition to the dependencies on observed contexts typical of contextual bandits. To manage the various statistical dependencies introduced by the reward structure under a standard HMM forgetting condition, our primary algorithmic strategy involves periodically updating the reward-model parameters.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





