Easier to Mislead Than to Correct: Harmful and Beneficial Revision in LLM Conformity
Title: The Asymmetry of Influence: Distinguishing Harmful from Beneficial Revision in Large Language Model Conformity
Abstract:
As Large Language Models (LLMs) become integral components of multi-agent ecosystems, they frequently encounter and react to the outputs of other agents. A significant vulnerability in this dynamic is conformity, wherein a model may discard its own original response merely because other agents have converged on an alternative answer. While previous research has established that LLMs tend to shift their answers toward the majority view, it has remained uncertain whether such revisions predominantly serve to rectify errors or to introduce new ones. This study employs a controlled experimental framework where an LLM provides an initial answer, is then exposed to simulated peer responses, and subsequently renders a final decision. By manipulating two specific social cuesâconsensus structure and authority labels attributed to peersâwe assess their impact on both beneficial and harmful revisions. Our analysis, spanning four open-weight LLMs and seven question-answering datasets, reveals a striking asymmetry: peer agreement significantly facilitates the misleading of models that were initially correct, whereas it is far less effective at correcting models that were initially wrong. Furthermore, authority labels increase the likelihood of models adopting the endorsed answer, irrespective of its factual accuracy. Alarmingly, standard reasoning interventions, including chain-of-thought prompting and reflection techniques, fail to consistently mitigate harmful revisions without also suppressing beneficial ones. These results imply that multi-agent LLM architectures should prioritize the verification of peer inputs over simple aggregation methods.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




