The process of selectively using only a subset of a model's total parameters during inference, reducing computational cost while maintaining performance.