A technique where only a small portion of a model's total parameters are used during inference, reducing computational cost while maintaining model capacity.