Do Explanations Increase the Risk of Decision Logic Leakage? Explanation-Guided Stealing of Graph Models
Title: Does Providing Explanations Heighten the Vulnerability to Decision Logic Leakage? A Study on Explanation-Guided Model Theft
Abstract: Graph Neural Networks (GNNs) are now indispensable for processing graph-structured data in fields like financial analysis and drug discovery, creating an urgent need for greater model transparency. While recent developments in explainable GNNs meet this demand by highlighting key subgraphs that drive predictions, these very mechanisms may unintentionally expose the models to security threats. This study examines how such explanations can inadvertently leak critical decision logic, which adversaries can exploit to steal models. We introduce {\method}, a novel framework for model theft that combines explanation alignment to capture decision logic with guided data augmentation to facilitate efficient training even when query access is limited. This approach allows for the effective replication of both the predictive outputs and the underlying reasoning patterns of target models. Evaluations on molecular graph datasets reveal that our method outperforms traditional stealing techniques. These findings underscore significant security concerns regarding the deployment of explainable GNNs in sensitive sectors and point to the necessity of implementing safeguards against explanation-based attacks. The source code is accessible at https://github.com/beanmah/EGSteal.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



