arXiv

Counterfactual Explanations for Deep Two-Sample Testing

June 4, 2026 · Wei-Cheng Lai, Marco Simnacher, Christoph Lippert · Original Source

Title: Generating Counterfactual Explanations for Deep Two-Sample Testing

Abstract: While two-sample testing serves as a cornerstone for identifying distributional shifts across various scientific fields, traditional methods—such as those relying on kernels—often struggle with high-dimensional, structured data like images. Although recent advancements in deep two-sample testing have enhanced sensitivity by leveraging learned representations, these approaches offer little visibility into the specific data features responsible for rejecting the null hypothesis $H_0$. To bridge this gap, we introduce a counterfactual explanation framework designed for deep two-sample testing. This approach produces sample-level modifications that shift observations from a source group toward a target group, while explicitly minimizing the discrepancy quantified by the test. By integrating a diffusion autoencoder with a pretrained deep two-sample test model, our method optimizes a maximum mean discrepancy (MMD) objective within the test model’s representation space to generate plausible counterfactuals. We assess the impact of these transformations on the distribution level by examining shifts in the test statistic and the associated two-sample p-values. Our evaluation spans synthetic 2D shape datasets and two distinct MRI cohorts. In both scenarios, the counterfactual transformations consistently result in higher p-values compared to the original samples, demonstrating that the edited source set aligns more closely with the target distribution according to the test. To guarantee that counterfactuals remain minimal and faithful to the original inputs, we utilize LPIPS for measurement. Ultimately, these edits yield interpretable evidence highlighting the features linked to the observed group disparities. Notably, in the MRI analysis, the localized modifications align with established anatomical distinctions between the cohorts.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC