Privacy vulnerabilities and model performance are concentrated in a small set of weights—you can defend against privacy attacks by carefully fine-tuning just these critical weights instead of retraining the whole model.
This paper identifies that privacy leaks in neural networks come from a tiny fraction of weights, and these same weights are crucial for model performance. Rather than retraining the entire model, the authors propose selectively rewinding only these critical weights during fine-tuning to defend against membership inference attacks while keeping the model accurate.