Fine-tune only the value matrix in attention layers to improve zero-shot performance without breaking the model's ability to learn from in-context ...
When you fine-tune a language model to work better on specific tasks without examples, it often loses the ability to learn from examples shown in the prompt.