arXiv

Exploring Adversarial Robustness and Safety Alignment in Multilingual Multi-Modal Large Language Models

Investigating Adversarial Resilience and Safety Alignment in Multilingual Multi-Modal Large Language Models

Abstract

By incorporating visual perception into language reasoning, Multimodal Large Language Models (MLLMs) create a persistent attack vector that is vulnerable to adversarial manipulation. Existing research on MLLM robustness has predominantly centered on English-only applications, largely neglecting their behavior in multilingual contexts. This study fills that void by conducting a comprehensive analysis of adversarial robustness and multimodal safety across twelve distinct languages, focusing on open-source MLLMs that achieve multilingual proficiency via instruction tuning.

Our investigation utilizes gradient-based attacks to uncover a significant, transferable vulnerability: adversarial images crafted to cause failures in one language remain effective across other languages, highlighting robust cross-lingual transferability. Furthermore, the degree of multilingual safety depends heavily on the model’s ability to accurately retrieve and interpret harmful directives. When malicious intent is conveyed through text, languages with stronger linguistic foundations are more prone to generating responses that facilitate misuse, whereas weaker languages result in fewer unsafe outputs. However, when harmful content is embedded within images as text, English scripts are consistently recognized and executed, while non-English scripts are seldom parsed by the vision encoder.

Consequently, lower-resource languages may seem safer, but this is merely an artifact of the model’s inability to comprehend or visually ground the input—a phenomenon we define as "safety-by-failure"—rather than true safety alignment. In contrast, MLLMs that integrate multilingual capabilities throughout their entire training lifecycle, such as Qwen3-VL, demonstrate authentic cross-lingual safety. These models maintain active refusal mechanisms across all languages instead of hiding comprehension gaps. Ultimately, shallow multilingual adaptations, like fine-tuning on translated instruction data, often yield only superficial understanding that creates a false sense of security in low-resource settings. Conversely, deep integration across various training phases fosters genuine multilingual safety alignment.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...