arXiv

MOSAIC: Efficient Mixture-of-Agent Scheduling via Adaptive Aggregation and Inference Concurrency

Title: MOSAIC: Accelerating Mixture-of-Agent Workloads through Adaptive Aggregation and Concurrent Inference

Abstract: Mixture-of-Agents (MoA) architectures enhance reasoning precision by directing individual queries to various specialized large language models (LLMs) and synthesizing their responses. However, executing these workloads efficiently on constrained GPU infrastructure presents significant challenges. The reliance on skill-based routing leads to uneven demand across experts, while the integration of instruction-tuned models with long-reasoning variants introduces substantial fluctuations in output length. As a result, conventional scheduling methods are prone to severe GPU underutilization and throughput degradation caused by load disparities. To address these issues, we introduce MOSAIC, a scheduling framework designed to boost MoA performance. Our approach begins with an Integer Linear Program (ILP)-based scheduler that simultaneously optimizes expert allocation and per-worker prompt distribution using offline-profiled cost data; this strategy involves duplicating computationally intensive reasoning experts across different workers while assigning lightweight models to specific instances. Furthermore, MOSAIC employs confidence-aware adaptive aggregation, which utilizes inter-expert consensus to bypass the resource-intensive final aggregator LLM for queries where agreement is already reached. In evaluations on a 4-GPU setup, MOSAIC delivers speedups of up to 2.5x during the expert stage, 4.23x during the aggregator stage, and 1.7 to 2.3x end-to-end compared to baseline schedulers, all while maintaining accuracy levels within 0.1 percentage points of the original performance.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...