Privacy Policy Enforcement Guardrails for Data-Sensitive Retrieval-Augmented Generation
Title: Guardrails for Enforcing Privacy Policies in Data-Sensitive Retrieval-Augmented Generation
Abstract: Conventional personally identifiable information (PII) filters frequently overlook contextual data leakage within Retrieval-Augmented Generation (RAG) systems, particularly when non-regulated attribute clusters combine to identify individuals. To address this, we present a Privacy Policy Enforcement (PPE) framework that utilizes dual one-class density estimators integrated with fused text embeddings, incorporating a calibrated abstention region for handling out-of-distribution inputs. By employing an axis-stratified, multi-LLM synthetic data pipeline spanning the medical, financial, and legal sectors, our analysis reveals that traditional Gaussian Mixture baselines are inadequate for borderline-safe stress tests, as they prioritize linguistic register over actual content. In contrast, our proposed T3+OCSVM detector, which is trained on both safe and borderline-safe datasets, delivers a borderline AUROC exceeding 0.93, while decreasing false positives by 44 to 55 percentage points and preserving millisecond-level latency. When compared against supervised MLP classifiers or 14B-parameter LLM judges, our framework demonstrates superior operational viability; the former is hindered by high abstention rates, while the latter faces challenges related to latency and calibration. This approach establishes a rigorous stress-testing benchmark for any classifier trained on synthetic data.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





