arXiv

Causal Evaluation of Membership Inference Attacks

June 2, 2026 · Mathieu Even, Cl\'ement Berenfeld, Linus Bleistein, Tudor Cebere, Julie Josse, Aur\'elien Bellet · Original Source

Title: A Causal Analysis of Membership Inference Attacks

Abstract:

Membership Inference Attacks (MIAs) serve as a primary tool for quantifying model memorization and evaluating privacy vulnerabilities by differentiating between training examples (members) and novel data (non-members). However, the conventional evaluation of MIAs necessitates extensive retraining, a process that imposes prohibitive computational costs on large-scale models. Consequently, researchers frequently rely on one-run approaches, which involve a single training phase with randomized data inclusion, or zero-run methods, which allow for post-hoc assessment. Despite their widespread adoption, the statistical rigor of these alternatives remains uncertain.

To bridge this gap, we conceptualize MIA evaluation through the lens of causal inference, defining memorization explicitly as the causal impact of incorporating a specific data point into the training set. This innovative perspective identifies and formalizes critical biases inherent in current protocols: one-run techniques are compromised by interference among jointly included data points, whereas zero-run evaluations are further distorted by distributional shifts between member and non-member data. We develop causal counterparts to standard MIA metrics and introduce practical estimators tailored for multi-run, one-run, and zero-run scenarios, ensuring non-asymptotic consistency. Our empirical validation across various contexts, including both pretrained and fine-tuned Large Language Models (LLMs), demonstrates that this framework facilitates trustworthy MIA performance measurement without the need for retraining, even in the presence of distribution shifts. Ultimately, our work establishes a robust theoretical basis for privacy assessment in contemporary AI systems.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC