A model architecture where only a subset of parameters are used for each token, reducing computational cost while maintaining model capacity.
Multi-step reasoning, logic puzzles, mathematical problem-solving