Compressing models for efficiency can accidentally increase bias—you need to monitor fairness metrics during compression, not just overall accuracy.
This paper tackles a hidden problem in model compression: when you shrink neural networks to run faster, the compression can unfairly hurt accuracy for certain groups of people (like underrepresented skin tones in medical imaging).