Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights

Xingli Fang, Jung-Eun Kim|March 13, 2026arXiv

Key Takeaway

Privacy vulnerabilities and model performance are concentrated in a small set of weights—you can defend against privacy attacks by carefully fine-tuning just these critical weights instead of retraining the whole model.

Summary

This paper identifies that privacy leaks in neural networks come from a tiny fraction of weights, and these same weights are crucial for model performance. Rather than retraining the entire model, the authors propose selectively rewinding only these critical weights during fine-tuning to defend against membership inference attacks while keeping the model accurate.

safety training efficiency

Key Terms

membership-inference-attack differential-privacy fine-tuning weight-importance