arXiv

Not All Flips Are Conformity: Decomposing Stance Convergence in Multi-Agent LLM Debate

Title: Beyond Conformity: Dissecting Stance Convergence in Multi-Agent LLM Debates

Abstract:

While multi-agent debate (MAD) is widely regarded as a potent method for enhancing the reasoning capabilities of large language models, the nature of consensus remains ambiguous. When agents align on a single answer, it is difficult to determine whether this convergence stems from authentic deliberation or mere social compliance. This study demonstrates that the standard metric of answer flip rates fails to distinguish between three separate processes: spontaneous instability, conformity driven by stance, and persuasion driven by reasoning. To address this, we introduce a three-source decomposition framework that isolates these mechanisms using controlled counterfactual conditions.

In our primary experiments on MMLU-Pro, 37% of agent-question observations shifted solely due to self-reflection. Robustness checks across GPQA-Diamond and three different model families further highlighted significant instability dependent on the specific model used. We found that strict conformity accounted for 29% of cases in the primary setting and was predominantly detrimental in model replications, with accuracy dropping from correct to wrong in 57-77% of instances.

Our controlled information-gradient experiment uncovered that even reasoning that lacks substantive content led to error adoption rates of 20-39% among initially resistant agents, indicating that the presentation style of reasoning holds considerable persuasive power. We also identified that harmful conformity can be predicted using features from Round 0 (AUC = 0.79). Implementing risk-targeted interventions successfully reduced this harmful conformity by 13.6 percentage points (p < 0.001). However, the study notes that without access to correctness labels or self-reflection controls, simply lowering the rate of peer adoption does not enhance overall accuracy, as it becomes impossible to differentiate between beneficial and harmful influence.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...