A technique where only a subset of a model's weights are used for each input, rather than activating all parameters, which reduces memory usage and speeds up inference.