arXiv

Backdoor Unlearning Generalization: A Path Toward the Removal of Unknown Triggers in LLMs

Title: Backdoor Unlearning Generalization: A Path Toward the Removal of Unknown Triggers in LLMs

Abstract:

The security of Large Language Models (LLMs) is increasingly threatened by backdoor attacks, which enable adversaries to manipulate model outputs. Current defense mechanisms suffer from a significant structural disadvantage: they typically address backdoors individually and rely on prior knowledge of the specific triggers involved. This approach is inadequate when the model contains unknown backdoors. In this work, we demonstrate that backdoor neutralization via unlearning exhibits generalization capabilities. Specifically, we find that training a model to disregard a single trigger can inadvertently suppress other backdoors that were not explicitly targeted during the unlearning process.

We investigate this phenomenon across three distinct model families, where backdoors were introduced through either pretraining or continual pretraining, by systematically analyzing the models resulting from the removal of one backdoor at a time. To elucidate the mechanisms behind this cross-backdoor suppression, we propose the Cross Activation Shift Distance, a metric designed to quantify the divergence between model state changes caused by different training procedures. Our findings suggest a novel avenue for enhancing LLM safety: defenders could intentionally introduce and subsequently eliminate controlled backdoors. This strategy leverages cross-backdoor transfer effects to neutralize unknown threats that attackers may have previously embedded within the model.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...