ALMAB-DC: Active Learning, Multi-Armed Bandits, and Distributed Computing for Sequential Experimental Design and Black-Box Optimization
Title: ALMAB-DC: Leveraging Active Learning, Multi-Armed Bandits, and Distributed Computing for Black-Box Optimization and Sequential Experimental Design
Abstract
Sequential experimental design presents a significant hurdle in computational statistics, particularly when dealing with expensive, gradient-free objectives. In such scenarios, evaluation budgets are strictly limited, necessitating the efficient extraction of information from every observation. To address this, we introduce ALMAB-DC, a framework grounded in Gaussian Processes (GP) that integrates active learning, multi-armed bandits (MAB), and distributed asynchronous computing for expensive black-box experiments.
The architecture employs a Gaussian process surrogate paired with an uncertainty-aware acquisition function to pinpoint informative query points. A bandit controller, utilizing either Upper Confidence Bound (UCB) or Thompson sampling, manages the allocation of evaluations among parallel workers, while an asynchronous scheduler accommodates varying runtime durations. We establish cumulative regret bounds for the bandit elements and analyze parallel scalability through the lens of Amdahl's Law.
Our validation of ALMAB-DC spans five benchmark tasks. In two statistical experimental design applications, the framework demonstrated superior performance: it yielded lower simple regret compared to Equal Spacing, Random, and D-optimal designs in dose–response optimization. Furthermore, in adaptive spatial field estimation, it matched the Greedy Max-Variance benchmark while surpassing Latin Hypercube Sampling. Specifically, in a distributed setup with $K=4$, the system achieved target performance in just one-quarter of the sequential wall-clock rounds.
On three machine learning and engineering benchmarks—CIFAR-10 Hyperparameter Optimization (HPO), Computational Fluid Dynamics (CFD) drag minimization, and MuJoCo Reinforcement Learning (RL)—ALMAB-DC delivered strong results. It attained a 93.4% accuracy rate on CIFAR-10, exceeding BOHB by 1.7 percentage points and Optuna by 1.1 percentage points. For airfoil drag minimization, it reduced the drag coefficient to $C_D = 0.059$, marking a 36.9% improvement over Grid Search. In RL tasks, it boosted returns by 50% relative to Grid Search. All performance gains over non-ALMAB baselines were confirmed as statistically significant using Bonferroni-corrected Mann–Whitney $U$ tests. Additionally, distributed execution yielded a $7.5\times$ speedup at $K = 16$ agents, aligning with predictions from Amdahl's Law.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC


