Perturbation Effects on Accuracy and Fairness among Similar Individuals
Title: Analyzing the Impact of Perturbations on Accuracy and Fairness in Comparable Cases
Abstract: Deep neural networks are susceptible to adversarial perturbations, which can concurrently undermine prediction robustness and individual fairness across various application domains. Current evaluation methods usually examine these aspects separately, a practice that masks significant failure modes. To address this limitation, we define Robust Individual Fairness (RIF), a standard requiring that predictions under semantic-preserving (or truth-condition-preserving) perturbations remain accurate relative to ground truth and consistent among semantically equivalent individuals. To detect RIF violations, we propose RIFair, a black-box adversarial framework employing a decoupled perturbation strategy to generate instance pairs that preserve semantics but exhibit a lack of robustness and/or fairness. Our experiments, conducted on multiple model architectures and real-world textual datasets, reveal that metrics focusing solely on robustness or fairness often overlook Robust Biased and Unrobust Fair behaviors. RIFair effectively uncovers these latent vulnerabilities, establishing RIF as an essential criterion for evaluating model trustworthiness. The code for these experiments is available at https://github.com/Xuran-LI/RIFair.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




