Persona Attack: Incremental Memory Injection Jailbreak Attack against Large Language Models
Title: Persona Attack: Incremental Memory Injection Jailbreak Attack against Large Language Models
Abstract:
Despite continuous advancements in safety training, Large Language Models (LLMs) remain susceptible to jailbreak attacks as they become more user-friendly. Conventional jailbreak strategies generally rely on isolated prompt injections, often overlooking the model’s capacity to retain conversational context and user directives over time. This study introduces Persona Attack, a novel jailbreak technique grounded in memory injection that systematically manipulates the model’s context window through a sequential process. Our experiments, conducted across several prominent LLMs, indicate that as injected instructions accumulate within the model's memory, these directives increasingly supersede the models' internal safety alignments. Additionally, our empirical findings show that the effectiveness of the attack is influenced by both the specific memory architecture of the model and the particular combination of instructions employed, achieving an attack success rate as high as 95% under certain instructional configurations.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




