arXiv

Investigating Adversarial Robustness of Multi-modal Large Language Models

Title: Examining the Adversarial Resilience of Multi-modal Large Language Models

Abstract: While Multi-modal Large Language Models (MLLMs) demonstrate impressive capabilities in vision-language tasks, the integration of visual inputs via encoders such as CLIP significantly broadens their attack surface, rendering them susceptible to visual adversarial perturbations. Previous defensive strategies have generally maintained compatibility with pre-trained MLLMs by imposing rigid constraints to align with CLIP’s original embedding space during adversarial fine-tuning. Although this approach is practical, it inherently restricts the maximum level of robustness that can be achieved. This study offers a comprehensive analysis of adversarial robustness within MLLMs. We propose a diagnostic CLIP-alignment protocol designed to forecast, before full MLLM training begins, which robust vision encoders will transfer effectively to multi-modal environments. Our findings indicate that large-scale multimodal adversarial pretraining, rather than unimodal scale alone, is the pivotal element for successful robustness transfer. By incorporating these superior encoders into MLLMs through end-to-end multimodal training, we observed average improvements of 28 CIDEr points in captioning and an 11.7% increase in VQA accuracy under strong adversarial conditions, outperforming constrained plug-and-play baselines. Additionally, we demonstrate that applying adversarial training directly to a standard, non-robust MLLM results in performance degradation for both clean and adversarial data, proving that robust visual representations are a mandatory foundation. However, conducting end-to-end adversarial training starting from a robust backbone provides further enhancements of 1.9 CIDEr points and 4.3% VQA accuracy. In addition to training-time solutions, we highlight lightweight test-time visual stochastic transformations as an effective black-box defense for non-robust MLLMs, boosting adversarial performance from near-zero levels to match those of robust models. Lastly, we confirm that our robust models significantly mitigate toxic generation during white-box visual jailbreak attacks. The code and pre-trained weights will be made publicly available.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...