Dive into Ambiguity: A*-Inspired Multi-Agents Commonsense Obfuscation Attack on LLM Prompts
Title: Navigating Ambiguity: An A*-Driven Approach to Multi-Agent Commonsense Obfuscation in LLM Prompts
Abstract: While large language models (LLMs) demonstrate superior capabilities in reasoning and knowledge-heavy applications, they remain susceptible to adversarial attacks at the prompt level. These attacks are designed to maintain the original intent while inducing commonsense hallucinations, a vulnerability that poses significant risks as LLMs are increasingly deployed in safety-critical sectors where factual accuracy is paramount. Current mitigation strategies often suffer from inefficiencies or fail to replicate the adaptive tactics employed by actual adversaries. To address this, we introduce an A*-inspired Factual Error Induction Framework, engineered to produce prompts that are semantically coherent yet obfuscated. Central to this framework is a Hierarchical Rewrite Strategy, governed by a dynamic semantic dispersion coefficient $\gamma$. This mechanism balances conservative modifications in the initial stages with more aggressive obfuscation in later phases, adhering to a reverse simulated annealing schedule. Furthermore, we incorporate an Agentic Mechanism Labeling component to identify and refine adversarial mechanisms, thereby providing interpretable reverse optimization. Theoretical analysis confirms that prompt rewriting operates as a contractive recurrence, resulting in semantic collapse as $\gamma$ diminishes. Experimental results across various LLMs indicate that our approach outperforms exhaustive search methods in terms of attack success rates, achieving these results with fewer attempts and demonstrating both high efficiency and effectiveness.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




