arXiv

ALMAB-DC: Active Learning, Multi-Armed Bandits, and Distributed Computing for Sequential Experimental Design and Black-Box Optimization

Title: ALMAB-DC: Leveraging Active Learning, Multi-Armed Bandits, and Distributed Computing for Black-Box Optimization and Sequential Experimental Design

Abstract

Sequential experimental design presents a significant hurdle in computational statistics, particularly when dealing with expensive, gradient-free objectives. In such scenarios, evaluation budgets are strictly limited, necessitating the efficient extraction of information from every observation. To address this, we introduce ALMAB-DC, a framework grounded in Gaussian Processes (GP) that integrates active learning, multi-armed bandits (MAB), and distributed asynchronous computing for expensive black-box experiments.

The architecture employs a Gaussian process surrogate paired with an uncertainty-aware acquisition function to pinpoint informative query points. A bandit controller, utilizing either Upper Confidence Bound (UCB) or Thompson sampling, manages the allocation of evaluations among parallel workers, while an asynchronous scheduler accommodates varying runtime durations. We establish cumulative regret bounds for the bandit elements and analyze parallel scalability through the lens of Amdahl's Law.

Our validation of ALMAB-DC spans five benchmark tasks. In two statistical experimental design applications, the framework demonstrated superior performance: it yielded lower simple regret compared to Equal Spacing, Random, and D-optimal designs in dose–response optimization. Furthermore, in adaptive spatial field estimation, it matched the Greedy Max-Variance benchmark while surpassing Latin Hypercube Sampling. Specifically, in a distributed setup with $K=4$, the system achieved target performance in just one-quarter of the sequential wall-clock rounds.

On three machine learning and engineering benchmarks—CIFAR-10 Hyperparameter Optimization (HPO), Computational Fluid Dynamics (CFD) drag minimization, and MuJoCo Reinforcement Learning (RL)—ALMAB-DC delivered strong results. It attained a 93.4% accuracy rate on CIFAR-10, exceeding BOHB by 1.7 percentage points and Optuna by 1.1 percentage points. For airfoil drag minimization, it reduced the drag coefficient to $C_D = 0.059$, marking a 36.9% improvement over Grid Search. In RL tasks, it boosted returns by 50% relative to Grid Search. All performance gains over non-ALMAB baselines were confirmed as statistically significant using Bonferroni-corrected Mann–Whitney $U$ tests. Additionally, distributed execution yielded a $7.5\times$ speedup at $K = 16$ agents, aligning with predictions from Amdahl's Law.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

TechCrunch

Meta’s Oversight Board says account bans lack due process, transparency

Meta’s Oversight Board criticized account bans for lacking due process and transparency, citing inconsistent enforcement...

Fed's Daly Says Forward Guidance Could Be Misleading
Bloomberg

Fed's Daly Says Forward Guidance Could Be Misleading

Fed’s Daly warns forward guidance may be misleading or lack clarity.

TechCrunch

Meta rolls out a new AI creator assistant on Facebook

Meta launched an AI creator assistant on Facebook to streamline analytics and content brainstorming. Initially available...

TechCrunch

What to expect from WWDC 2026: Siri’s highly anticipated revamp and Apple Intelligence updates

WWDC 2026 promises a Siri revamp powered by Google’s Gemini and standalone app, plus AI agents in the App Store and Came...

TechCrunch

A burglar used a Waymo to steal yoga clothes in San Francisco — and got away with it

A thief stole yoga clothes using a Waymo, but police failed to catch them because the car’s video data was deleted and b...

Goldman Sachs CEO David Solomon on the Coming Mega IPOs
Bloomberg

Goldman Sachs CEO David Solomon on the Coming Mega IPOs

Goldman Sachs CEO David Solomon anticipates a surge in major IPOs, signaling renewed market confidence and significant o...