arXiv

RogueMerge: Robust and Unified Attacks against LLM Model Merging

Title: RogueMerge: A Robust and Unified Approach to Attacking LLM Model Merging

Abstract:

Model merging integrates specialized functionalities into a single Large Language Model (LLM) by aggregating task vectors obtained from unverified public repositories, thereby creating a significant vulnerability in the supply chain. Since task vectors can embed malicious behaviors, the merging process effectively grants third-party inputs direct write access to model weights, allowing attackers to trigger or exacerbate various downstream threats. Previous research has primarily focused on backdoor attacks targeting classifiers through static arithmetic heuristics. However, these methods are ill-suited for generative LLMs due to three fundamental limitations: (i) LLMs utilize autoregressive decoding, meaning that the slight parameter shifts caused by merging accumulate across tokens, rapidly diminishing the attack’s efficacy; (ii) attackers lack insight into the victim’s specific merging configurations, causing isolated, static attack vectors to be easily diluted or nullified; and (iii) practical threats must generalize to unseen attack prompts, a requirement that static vectors cannot meet.

We introduce RogueMerge, the first comprehensive framework designed to overcome these three challenges. To counter the compounding effects of autoregressive generation, we substitute static arithmetic with a joint optimization process that explicitly ensures attack success post-merging. To address unknown merging parameters, we treat attack injection as a stochastic min-max problem, resolving it through meta-learning-style simulations. Furthermore, to ensure robustness across diverse attack prompts, we implement distributionally robust optimization, deriving a tractable first-order Taylor approximation suitable for LLMs with a provable error bound. Evaluations across four threat types, six merging algorithms, and more than 170 merged LLMs demonstrate that RogueMerge consistently surpasses existing attack methods. Additionally, it maintains stability under varying merging conditions and resists conventional defensive measures.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...