Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models

Chungpa Lee, Jy-yong Sohn, Kangwook Lee|February 26, 2026arXiv

Key Takeaway

Fine-tune only the value matrix in attention layers to improve zero-shot performance without breaking the model's ability to learn from in-context ...

Summary

When you fine-tune a language model to work better on specific tasks without examples, it often loses the ability to learn from examples shown in the prompt.

training efficiency

Key Terms

in-context-learning fine-tuning linear-attention zero-shot-performance