DiscourseFlip: An Oblique Discourse-Level Opinion Manipulation Attack against Black-box Retrieval-Augmented Generation
Title: DiscourseFlip: A Stealthy, Discourse-Level Opinion Manipulation Attack Targeting Black-Box Retrieval-Augmented Generation
Abstract: While Retrieval-Augmented Generation (RAG) systems are increasingly pervasive and impactful, their dependence on external data sources introduces significant security vulnerabilities stemming from compromised retrieval content. Current RAG attacks predominantly target single queries or narrow, topic-specific sets, which restricts their real-world applicability and reduces their ability to evade detection. This study presents a novel threat model known as discourse-level opinion manipulation, where coordinated influence across a semantic query network drives opinion changes throughout a broad, multi-topic query landscape. We define this threat within a black-box framework and introduce DiscourseFlip, an agentic, graph-based attack mechanism designed to strategically distribute a constrained poisoning budget to maximize opinion deviation at the discourse level. Comprehensive experiments reveal that DiscourseFlip reliably triggers intended opinion shifts across the contextualized query network, demonstrating superior coverage and efficacy compared to existing baseline methods. Additionally, user studies indicate that the attack is both effective and sufficiently camouflaged to avoid user detection. Furthermore, systematic evaluations highlight the insufficiency of current mitigation strategies against discourse-level manipulation, emphasizing the critical necessity for more resilient and adaptive defenses to counter these specific vulnerabilities.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




