A Geometric View of Counterfactual Behavior: Interaction of Boundary Proximity and Local Support
Title: Interpreting Counterfactual Dynamics: The Role of Boundary Proximity and Local Data Support
Abstract
Counterfactual explanations, which identify minimal yet semantically relevant modifications to input data that shift a model’s output, have become essential tools for interpreting and auditing machine learning systems. In contemporary vision, language, and multimodal architectures, pretrained encoders typically project inputs into representation spaces, where downstream classifier heads establish decision boundaries. Consequently, the viability and distance of proximate counterfactuals are heavily influenced by how these boundaries are positioned relative to the underlying data distribution. However, models exhibiting comparable predictive accuracy may vary significantly in their capacity to generate such changes and the magnitude of movement required within the representation space.
This study investigates these discrepancies through a standardized local search probe applied to various pretrained encoders paired with linear classifier heads. Our findings reveal that while predictive performance remains consistent across models, their counterfactual behaviors diverge markedly. Notably, when representations are held constant, modifying only the classifier head can substantially alter counterfactual outcomes without impacting predictive accuracy. We attribute this phenomenon to the interplay between the proximity of the decision boundary and the density of local data support, factors that jointly dictate whether a prediction shift is feasible and grounded in data-supported regions. Furthermore, understanding this interaction can enhance counterfactual search strategies within static models. Ultimately, these results position counterfactual behavior as a critical metric independent of predictive performance, demonstrating that it can be manipulated without compromising accuracy. This distinction carries significant implications for model selection, robustness assessment, and the trustworthiness of counterfactual methodologies.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




