Mixture of Experts (MoE)

2 Posts

Efficiency Experts: Mixture of Experts Makes Language Models More Efficient

The emerging generation of trillion-parameter language models take significant computation to train. Activating only a portion of the network at a time can cut the requirement dramatically and still achieve exceptional results.

Different graphs showing switch transformer data

Mixture of Experts (MoE)

Bigger, Faster Transformers: Increasing parameters without slowing down transformers

Performance in language tasks rises with the size of the model — yet, as a model’s parameter count rises, so does the time it takes to render output. New work pumps up the number of parameters without slowing down the network.

Mixture of Experts (MoE)

Efficiency Experts: Mixture of Experts Makes Language Models More Efficient

Bigger, Faster Transformers: Increasing parameters without slowing down transformers

Subscribe to The Batch