STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations
Title: STRIDE: Enabling Training Data Attribution Through Sparse Recovery and Subset Perturbations
Abstract: Training Data Attribution (TDA) aims to identify which specific training examples influenced a model’s outputs. While causal interventions—assessing model behavior upon the addition or removal of data—are considered the benchmark for TDA, the computational burden of repeated retraining makes this approach impractical for Large Language Models (LLMs). As a result, most existing methods approximate these effects within the parameter space using gradients. However, this strategy is flawed: monitoring gradients across billions of parameters is excessively costly and depends on local approximations that may lack precision.
To address these limitations, we propose a paradigm shift by modeling the functional impact of training data within the activation space, rather than estimating changes in parameters. We present STRIDE (Steering-based Training Data Influence Decomposition), a novel framework that treats TDA as a sparse recovery problem, drawing inspiration from compressive sensing. STRIDE develops lightweight "steering operators" that replicate the behavioral shifts resulting from training on specific data subsets. By analyzing how these operators alter test predictions, we can deduce the influence of individual training examples through sparse linear decomposition.
STRIDE delivers state-of-the-art performance for LLM pre-training attribution and operates at a speed that is an order of magnitude greater ($13\times$) than prior methods. We further demonstrate its practical value through applications in data selection, detection of data contamination, and qualitative analysis.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC





