arXiv

Safe-FedLLM: Delving into the Safety of Federated Large Language Models

June 2, 2026 · Mingxiang Tao, Yu Tian, Wenxuan Tu, Yue Yang, Xue Yang, Xiangyan Tang · Original Source

Title: Safe-FedLLM: Investigating the Security of Federated Large Language Models

Abstract: Federated learning (FL) offers a solution to the challenges of data silos and privacy preservation during the training of large language models (LLMs). While existing research has primarily concentrated on enhancing the efficiency of federated learning for LLMs (FedLLM), the security aspects of open federated environments—specifically mechanisms to defend against malicious participants—have received limited attention. To address this gap, we initiate a preliminary investigation into the security of FedLLM by examining potential attack vectors and defensive capabilities through the lens of LoRA updates. Our analysis reveals two critical findings: first, LLMs are susceptible to attacks originating from malicious clients within an FL setting; second, LoRA updates display unique behavioral signatures that allow lightweight classifiers to distinguish them effectively. Leveraging these insights, we introduce Safe-FedLLM, a defense framework based on probing. This system implements protection at three distinct tiers: Step-Level, Client-Level, and Shadow-Level. The fundamental principle of Safe-FedLLM involves conducting probe-based discrimination on the local LoRA updates of each client. By treating these updates as high-dimensional behavioral features, a lightweight classifier is employed to identify potential malicious activity. Comprehensive experiments confirm that Safe-FedLLM significantly bolsters the robustness of FedLLM against malicious clients without compromising performance on legitimate data. Importantly, the method successfully mitigates the influence of malicious data while preserving training speed and demonstrating resilience even when the proportion of malicious clients is high.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC