arXiv

Sequential Data Poisoning in LLM Post-Training

Title: Sequential Data Poisoning in LLM Post-Training

Abstract: Large Language Model (LLM) post-training typically involves a multi-stage process, such as supervised fine-tuning (SFT) followed by reinforcement learning from human feedback (RLHF) or direct preference optimization (DPO). Since each stage relies on data from distinct and potentially untrusted sources, existing research has focused on data poisoning attacks at individual stages but has overlooked scenarios involving multiple attackers. To assess the trustworthiness of the entire post-training pipeline, we introduce the threat model of sequential data poisoning, which involves separate adversaries poisoning both SFT and preference datasets. Our analysis reveals a "single-attacker illusion": when evaluated in isolation, each adversary appears to pose a minimal threat. However, collaboration across stages exposes significant vulnerabilities. In the SFT $\to$ DPO pipeline, the attackers' effects are additive; distributing a fixed poison budget across stages yields better results than concentrating it in just one. Conversely, in the SFT $\to$ PPO pipeline, the contributions are complementary: while poisoning the SFT data or the reward model fails individually, their combination is successful. These results demonstrate that security assessments of isolated post-training stages systematically underestimate the compound vulnerabilities arising from their interaction. Code is available at https://github.com/jcksanderson/sequential-poisoning.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

AI Concentration Risk Is the Problem: 3-Minutes MLIV
Bloomberg

AI Concentration Risk Is the Problem: 3-Minutes MLIV

The article argues that AI concentration risk, rather than the technology itself, is the primary concern. It highlights ...

Reuters

Foxconn announces strategic collaboration with Intel on next-gen AI infrastructure

Foxconn and Intel announced a strategic partnership to develop next-generation AI infrastructure. This collaboration aim...

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Reuters

Europe's tech 'liberation day'? Computer says not yet

Europe’s expected tech breakthrough remains unrealized, as current systems indicate that a true "liberation day" has not...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.