A unifying Bayesian framework for adversarial robustness
Title: A Unified Bayesian Approach to Adversarial Robustness
Abstract: Machine learning systems face a persistent and significant threat to societal security due to their susceptibility to adversarial attacks. Conventional mitigation techniques, such as adversarial training, generally enhance model robustness by optimizing for worst-case loss scenarios. However, these deterministic methods overlook the inherent uncertainty surrounding an adversary’s actions. Although some stochastic defenses attempt to address this by assigning probability distributions to potential attacks, they frequently suffer from a lack of statistical rigor and often leave their foundational assumptions implicit. To address these shortcomings, we propose a formal Bayesian framework that captures adversarial uncertainty via a stochastic channel, thereby clearly defining all probabilistic premises. This approach leads to two distinct robustification strategies: a proactive defense implemented during the training phase, which corresponds to adversarial training, and a reactive defense applied during deployment, akin to adversarial purification. Our model encompasses several leading-edge defenses as specific limiting cases. Through empirical validation, we demonstrate the advantages of explicitly representing adversarial uncertainty.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





