arXiv

Fair Finetuning Mitigates Distribution Inference Attacks

June 2, 2026 · Rakshit Naidu · Original Source

Title: Fair Fine-Tuning Offers Defense Against Distribution Inference Attacks

Abstract:

Machine learning systems trained on confidential datasets risk inadvertently exposing population-level details regarding their training distributions, a vulnerability categorized as a distribution inference attack (DIA). In this scenario, a black-box adversary can deduce sensitive demographic characteristics, such as the proportion of specific subgroups, without ever having direct access to the training data. Although existing countermeasures like differential privacy and property unlearning have been introduced, the relationship between fairness constraints and distributional leakage has not yet been investigated.

To address this gap, we introduce Fair Fine-tuning (FFt), a method wherein a pre-trained model is fine-tuned using samples drawn from the complementary distribution while adhering to an Equalized Odds (EO) constraint. We offer a comprehensive theoretical analysis, demonstrating a tight bound defined as $\text{Adv}(\mathcal{A},M_f) \le \Delta_{\text{EO}} \cdot W$. In this equation, $W$ represents the degree to which the two training distributions can be distinguished based on their sensitive-attribute makeup. Furthermore, we prove the tightness of this bound and identify the necessary conditions under which FFt successfully diminishes adversarial advantage.

Our empirical evaluation covers six diverse datasets across tabular (ACS Income, COMPAS, German Credit), image (UTKFaces), and natural language processing (Bias in Bios) domains. Results indicate that rehearsal-based FFt reliably lowers the adversarial accuracy gap beneath the detection threshold of $\tau=0.1$ in all tested scenarios. Notably, on the ACS Income dataset, the gap decreased from approximately $15\%$ to less than $4\%$. This study presents the first formal bound linking a model’s measured EO disparity directly to its adversarial advantage within the DIA framework, thereby establishing a novel pathway for integrated fairness and privacy protections.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC