An architecture where a model contains multiple specialized sub-networks (experts) and selectively activates only a few for each input, improving efficiency without sacrificing capability.