The Alignment Curse: Modality Alignment Supercharges Audio Attacks via Text Transfer
Title: The Alignment Curse: Modality Alignment Supercharges Audio Attacks via Text Transfer
Original: arXiv:2602.02557v2 Announce Type: replace-cross Abstract: Recent advances in end-to-end trained omni-models have substantially improved audio capabilities by strengthening text-audio modality alignment. However, whether such alignment inadvertently facilitates the transfer of safety vulnerabilities across modalities remains underexplored. This question is critical as text-based jailbreak attacks are considerably more mature than audio-based ones; if they transfer systematically, current audio safety evaluations may underestimate risks originating from the text modality. In this paper, we introduce the Alignment Curse, a formally characterized and empirically validated principle showing that stronger modality alignment enables more effective transfer of attacks from text to audio, revealing a fundamental tension between capability and safety. Motivated by this principle, we conduct a comprehensive black-box evaluation of three attack categories on recent omni-models (e.g., Qwen2.5-Omni, Qwen3-Omni): text attacks, text-transferred audio attacks, and audio attacks. We find that text-transferred audio attacks perform comparably to, and often better than, audio-based attacks, exhibiting a clear advantage under audio-only access. This suggests that text-based vulnerabilities play a pivotal role in shaping audio safety risks. Finally, we empirically analyze the relationship between modality alignment and transfer effectiveness across attack methods and models, observing consistent support for the Alignment Curse: tighter modality alignment leads to more effective cross-modality attack transfer.
Rewrite: The enhancement of audio performance in end-to-end trained omni-models has been driven by improvements in text-audio modality alignment. Yet, the extent to which this alignment might inadvertently allow safety vulnerabilities to migrate across modalities has not been thoroughly investigated. This gap is significant because jailbreak techniques targeting text are far more developed than those targeting audio. If these text-based exploits transfer systematically, existing assessments of audio safety might fail to account for dangers stemming from the text domain. To address this, we propose the "Alignment Curse," a principle that is both theoretically defined and empirically proven, demonstrating that enhanced modality alignment facilitates more efficient cross-modal attack transfer from text to audio. This finding highlights an inherent conflict between model capability and security. Guided by this insight, we performed a thorough black-box assessment on contemporary omni-models, including Qwen2.5-Omni and Qwen3-Omni, examining three distinct attack vectors: direct text attacks, audio attacks derived from text, and direct audio attacks. Our results indicate that audio attacks generated through text transfer are as effective as, and frequently superior to, native audio attacks, particularly when only audio input is available. This outcome implies that vulnerabilities in the text domain significantly influence audio safety concerns. Furthermore, our empirical analysis of the link between alignment strength and transfer success across various models and attack types consistently reinforces the Alignment Curse, confirming that stricter modality alignment results in more potent cross-modal attack transfer.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC






