arXiv

Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models

Title: Preserving In-Context Learning During Fine-Tuning: Theoretical Insights into Linear Attention Mechanisms

Abstract: Large language models built on Transformer architectures demonstrate the capability for in-context learning, which allows them to adapt to new tasks through few-shot prompting. While fine-tuning these models is commonly employed to boost zero-shot performance—thereby eliminating the need for examples and lowering inference expenses—this process often comes at the cost of diminished in-context learning abilities. Consequently, fine-tuned models may struggle with tasks that were not included in the fine-tuning dataset. In this study, we utilize linear attention models to offer a theoretical framework explaining how specific fine-tuning objectives alter attention parameters and identifying the conditions that precipitate a decline in few-shot performance. Our analysis reveals that updating all attention parameters can impair in-context learning; however, limiting updates to the value matrix enables improvements in zero-shot capabilities while maintaining in-context proficiency. Additionally, we demonstrate that adding an auxiliary loss for few-shot tasks primarily strengthens in-context learning on the target task, though it may reduce such abilities on unseen tasks. Our theoretical predictions are supported by empirical results derived from both synthetic and real-world datasets.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...