FoeGlass: Simple In-Context Learning Is Enough for Red Teaming Audio Deepfake Detectors
Title: FoeGlass: Basic In-Context Learning Suffices for Red Teaming Audio Deepfake Detectors
Abstract: Audio deepfake detection (ADD) systems play a vital role in mitigating the harmful applications of text-to-speech (TTS) technologies. To properly assess and fortify these detectors, it is essential to construct datasets that comprehensively cover the landscape of synthesized audio and pinpoint areas where models perform poorly. Current approaches to dataset creation are hindered by two primary obstacles: the reliance on manual data collection and the low efficiency in identifying vulnerabilities within ADD models. To overcome these limitations, we introduce FoeGlass, a novel black-box automated red-teaming framework for ADDs. This method successfully uncovers failure modes in the generated audio space that remain largely unexplored by current state-of-the-art deepfake benchmarks. Leveraging the in-context learning abilities of large language models (LLMs), FoeGlass navigates the input space of a TTS model to produce audio samples that deceive the target ADD, relying solely on black-box access to all system components. By employing a context grounded in diversity metrics, FoeGlass effectively addresses the prevalent issue of mode collapse often seen in automated red-teaming tools. Extensive empirical tests on various open-source ADD and TTS models reveal that data produced by FoeGlass reduces false negative rates by as much as 94% compared to unconditional sampling baselines and recent spoofing datasets, all without the need for manual oversight. Additionally, we demonstrate that FoeGlass-generated attacks are transferable across distinct ADD targets, highlighting its versatility and user-friendly nature for automating the red-teaming of ADD systems. Lastly, we show that fine-tuning ADD models with samples created by FoeGlass significantly boosts detector robustness, with improvements reaching up to 41%.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC





