arXiv

Streaming Communication in Multi-Agent Reasoning

June 4, 2026 · Zhen Yang, Xiaogang Xu, Wen Wang, Cong Chen, Xander Xu, Ying-Cong Chen · Original Source

Title: Streaming Communication in Multi-Agent Reasoning

Abstract: Conventional multi-agent reasoning frameworks typically adhere to a "generate-then-transfer" model, a structure that causes end-to-end latency to increase linearly as the pipeline depth grows. To address this, we present StreamMA, a novel multi-agent reasoning architecture that transmits each reasoning step to subsequent agents immediately upon generation. This approach effectively pipelines adjacent agents, thereby significantly lowering latency. Counterintuitively, this pipelining mechanism also enhances performance. Since the quality of multi-step reasoning is uneven—with earlier steps generally being more trustworthy than later ones—utilizing these dependable initial steps rather than the complete chain safeguards downstream agents from being misled by errors inherent in later stages. We substantiate these benefits through the first closed-form joint analysis comparing stream, serial, and single protocols, which yields the effectiveness hierarchy, an upper bound for speedup, and the cost ratio. Evaluations across eight reasoning benchmarks in mathematics, science, and code, utilizing two leading LLMs (Claude Opus 4.6 and GPT-5.4) and three distinct topologies (Chain, Tree, and Graph), demonstrate that StreamMA surpasses baseline methods (achieving an average improvement of +7.3 percentage points, with a maximum of +22.4 percentage points on HMMT 2026 using Claude Opus 4.6-high). Furthermore, we identify a "step-level scaling law": augmenting the number of steps per agent consistently boosts both effectiveness and efficiency, establishing a new scaling dimension that is orthogonal to and composable with agent-count scaling.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC