A mixture-of-experts design where only a small fraction of the model's parameters are used for each prediction, reducing computational cost while maintaining model capacity.