arXiv

Not All Flips Are Conformity: Decomposing Stance Convergence in Multi-Agent LLM Debate

June 2, 2026 · Xiqi Hao, Zengqing Wu, Yu-Xuan Qiu, Chuan Xiao, Ruiqi Xu, Shuyuan Zheng, Jianbin Qin · Original Source

Title: Beyond Conformity: Dissecting Stance Convergence in Multi-Agent LLM Debates

Abstract:

While multi-agent debate (MAD) is widely regarded as a potent method for enhancing the reasoning capabilities of large language models, the nature of consensus remains ambiguous. When agents align on a single answer, it is difficult to determine whether this convergence stems from authentic deliberation or mere social compliance. This study demonstrates that the standard metric of answer flip rates fails to distinguish between three separate processes: spontaneous instability, conformity driven by stance, and persuasion driven by reasoning. To address this, we introduce a three-source decomposition framework that isolates these mechanisms using controlled counterfactual conditions.

In our primary experiments on MMLU-Pro, 37% of agent-question observations shifted solely due to self-reflection. Robustness checks across GPQA-Diamond and three different model families further highlighted significant instability dependent on the specific model used. We found that strict conformity accounted for 29% of cases in the primary setting and was predominantly detrimental in model replications, with accuracy dropping from correct to wrong in 57-77% of instances.

Our controlled information-gradient experiment uncovered that even reasoning that lacks substantive content led to error adoption rates of 20-39% among initially resistant agents, indicating that the presentation style of reasoning holds considerable persuasive power. We also identified that harmful conformity can be predicted using features from Round 0 (AUC = 0.79). Implementing risk-targeted interventions successfully reduced this harmful conformity by 13.6 percentage points (p < 0.001). However, the study notes that without access to correctness labels or self-reflection controls, simply lowering the rate of peer adoption does not enhance overall accuracy, as it becomes impossible to differentiate between beneficial and harmful influence.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC