Auditing Privacy in Multi-Tenant RAG under Account Collusion
Title: Auditing Privacy in Multi-Tenant RAG under Account Collusion
Abstract:
Multi-tenant Retrieval-Augmented Generation (RAG) services typically define privacy boundaries at the account level, providing each account with an $(\varepsilon_{\text{acc}},\delta_{\text{acc}})$-differential privacy (DP) guarantee relative to the tenant index. This study demonstrates that such an approach significantly underestimates data leakage when accounts within the same index collude. Specifically, for retrieval mechanisms employing a "noise-then-select" strategy, $k$ coordinated accounts from the same tenant result in a joint leakage rate of $\Theta(\sqrt{k}\,\varepsilon_{\text{acc}})$, rather than the expected $\varepsilon_{\text{acc}}$. We present a corresponding membership-inference attack and empirically confirm the predicted $\sqrt{k}$ Area Under the Curve (AUC) trend across scalar, top-$K$, trained-embedder, and production-scale HNSW configurations. Furthermore, we introduce an audit protocol verifiable by third parties that attests to the integrity of noise-then-select retrieval. This protocol issues a $(\textsf{PASS},\varepsilon_{\text{audit}})$ report for coalitions up to a specified cap $k_{\max}$, without revealing the underlying index or altering the retrieval decision logic. It is important to note that this claim applies exclusively to the retrieval channel; assessing generation-channel leakage and estimating coalition size robustly against adversaries constitute separate, complementary audit objectives.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





