Evaluating and Learning Robust Bandit Policies Under Uncertain Causal Mechanisms
Title: Assessing and Acquiring Resilient Bandit Strategies Amid Uncertain Causal Dynamics
Abstract: Causal graphical models serve as powerful tools for integrating extensive structural insights, drawn from both expert domain knowledge and patterns identified through randomized trials or observational studies. Nevertheless, while the general topology of causal links may be understood, the precise nature of the underlying causal mechanisms often remains elusive. This study introduces a novel evaluation and learning framework for causal multi-armed bandits, designed to navigate uncertainty regarding conditional probability distributions with high efficacy. Additionally, we demonstrate the utility of conditional independence tests in selecting variables for modeling purposes. Our findings indicate that the Structural Equation Model (SEM) methodology yields superior evaluation accuracy relative to conventional methods, a benefit that becomes increasingly pronounced as the spectrum of potential causal mechanisms expands. Moreover, the SEM approach facilitates the discovery of low-variance policies and, provided the model is adequately specified, identifies the optimal policy. In contrast, traditional techniques are prone to converging on local optima or failing to converge entirely.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





