Detector-Evasive LLM Paraphrasing via Constrained Policy Optimization
Title: Enforcing Semantic Constraints for LLM Paraphrasing to Evade Detection via Constrained Policy Optimization
Abstract: Current artificial intelligence text detectors remain susceptible to paraphrasing techniques, including those specifically guided to evade detection. However, existing strategies for detector evasion frequently suffer from a lack of precise control regarding the maintenance of semantic meaning. Directly optimizing for evasion often results in the degradation of fine-grained semantics, while traditional scalarized reward mechanisms offer only indirect and weight-sensitive management of the balance between evasion and semantic integrity. To overcome this bottleneck, we model detector-evasive LLM paraphrasing as a Constrained Markov Decision Process. In this framework, detector evasion serves as the primary goal, while semantic preservation is maintained through an explicit constraint. We introduce Detector Evasion Policy Optimization (DEPO), a Lagrangian primal-dual reinforcement learning algorithm that incorporates a novel group-based policy update mechanism inspired by GRPO. DEPO dynamically adjusts the trade-off between semantic preservation and evasion during the training phase, allowing the policy to enhance attack success rates while strictly adhering to a predefined semantic preservation boundary. Empirical evaluations across MAGE, M4, RAID, and peer-review datasets, using detectors such as MAGE, RoBERTa, RADAR, Binoculars, and Fast-DetectGPT, demonstrate that DEPO delivers robust detector evasion while accurately satisfying semantic constraints. Furthermore, the method displays significant robustness across different domains, detectors, and prompt levels.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




