The Ringelmann Effect in Multi-Agent LLM Systems: A Scaling Law for Effective Team Size
Title: The Ringelmann Effect in Multi-Agent LLM Systems: A Scaling Law for Effective Team Size
Abstract:
Current approaches to scaling multi-agent Large Language Model (LLM) systems during inference lack a standardized metric. Simply counting the number of agents mistakenly equates financial cost with independent statistical evidence. To address this, we introduce a two-parameter scaling law defined as $R(N) = N_\text{eff}/N = 1/(1+c(N-1)N^{-\beta})$. This formula utilizes a regime exponent, $\beta$, to categorize any system configuration into one of three asymptotic behaviors: a hard ceiling at $1/c$ (where $\beta = 0$), a sublinear growth of $N^\beta/c$ (where $0 < \beta < 1$), or linear scaling (where $\beta = 1$).
Our analysis of the MMLU-Hard benchmark reveals that while different benchmarks exhibit varying levels of absolute performance, the structural parameters $(c, \beta)$ remain consistent across benchmarks. However, on free-form mathematical tasks, dense peer influence fundamentally alters the dynamics, collapsing the answer-level regime from sublinear to a hard ceiling, although the correctness-level fits maintain a hard ceiling throughout.
These results yield three key practical implications:
- Diminishing Returns in Homogeneous Teams: On MMLU-Hard, deploying thirty densely interacting debating agents yields no greater answer diversity than using a single agent.
- The Illusion of Debate: A noise placebo experiment demonstrates that self-correction on free-form math tasks scales at a rate of $4\times$. This suggests that within homogeneous teams, performance improvements typically attributed to "debate" are actually driven by self-re-evaluation rather than the exchange of peer-generated content.
- Predictive Pilots and Architectural Diversity: A small pilot study with $N \le 5$ agents can accurately predict the structural ceiling observed at $N=30$. Furthermore, among the configurations tested, only architectural diversity (heterogeneous teams) successfully reduced the parameter $c$ and allowed systems to escape the hard-ceiling regime. In contrast, interventions focused solely on communication modes failed to achieve this effect.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



