Visual-Noise Guided In-Context Distillation for Multimodal Large Language Model Unlearning
Title: Visual-Noise Guided In-Context Distillation for Multimodal Large Language Model Unlearning
Abstract
Multimodal Large Language Models (MLLMs) have demonstrated significant advancements in vision-language applications; however, their tendency to memorize and disclose sensitive or restricted information has sparked serious concerns regarding privacy and overall safety. Machine Unlearning (MU) has emerged as a viable solution, enabling the removal of specific unwanted knowledge from trained models without the need for complete retraining, thereby maintaining general utility. Despite this potential, achieving effective unlearning in MLLMs is notably difficult.
Current training-based approaches frequently face difficulties in striking a balance between unlearning efficacy and model performance. Conversely, training-free strategies, such as in-context unlearning, safeguard model utility by eschewing parameter updates. Yet, these methods fail to eliminate memorized data at the parameter level and remain susceptible to reverse-engineering attacks. Furthermore, in-context unlearning proves inadequate in multimodal environments, where visual inputs exert strong conditioning signals that can trigger unwanted outputs.
To overcome these limitations, we introduce Visual-Noise Guided In-Context Distillation (VGID), a framework for MLLM unlearning based on distillation. VGID dynamically generates an unlearning-focused teacher distribution from the frozen base model via dual-modal intervention. This process integrates textual in-context unlearning with visual perturbation. The distribution induced by these interventions acts as a teacher signal, steering the student model toward parameter-level unlearning. Notably, this approach eliminates the necessity for external teacher models or explicit annotations of undesirable responses.
Experimental evaluations indicate that VGID delivers robust unlearning performance while maintaining competitive model utility. In a representative scenario, the method reduced the ROUGE-L score of the forget set by 0.371, accompanied by a minimal decrease of 0.055 in the ROUGE-L score of the retain set.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




