arXiv

Calibration Data Trade-offs Across Capability Dimensions: Why Multi-Source Mixing Matters for High-Sparsity LLM Pruning

Title: Navigating Calibration Data Trade-offs Across Capability Dimensions: The Strategic Value of Multi-Source Mixing in High-Sparsity LLM Pruning

Abstract

Recent studies have suggested that the specific choice of unlabelled calibration data has a negligible effect on the overall averaged accuracy of large language models (LLMs) after post-training pruning to high sparsity levels. However, this conclusion warrants scrutiny when performance is assessed not as a single aggregate metric, but across distinct capability domains. By decomposing post-pruning performance into four specific areas—General knowledge, Commonsense reasoning, Code generation, and Mathematical ability—and evaluating 15 different calibration sources via Spearman correlations between OIT information metrics and retention rates per dimension, we identify a significant trade-off characterized by opposing signs.

Our analysis reveals that calibration perplexity exhibits a positive correlation with General capability retention ($\rho = +0.71$) but a negative correlation with both Math ($\rho = -0.53$) and Code ($\rho = -0.59$) retention ($p < 0.05$). This inverse relationship demonstrates that no single calibration source is capable of preserving all model capabilities simultaneously. To address this limitation, we introduce multi-source calibration mixing and propose IGSP, an information-guided self-calibration protocol. IGSP automates the construction of multi-source datasets without requiring corpora aligned with specific capabilities, achieving this by minimizing 4-gram aggregation while balancing perplexity across dimensions.

In experiments conducted on LLaMA-3.1-8B at 60% sparsity using SparseGPT, a uniform multi-source mixture achieved a total retention rate of 58.8%. This result outperforms the strongest single source, MetaMath (50.0%), by 8.8 percentage points, and surpasses the C4 default by 18.8 percentage points. Furthermore, IGSP demonstrates superior performance, improving upon Self-Cal by 2.4 points and SGS by 4.8 points.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...