Mixture of Experts (MoE)

2 Posts

Efficiency Experts
Mixture of Experts (MoE)

Efficiency Experts

The emerging generation of trillion-parameter language models take significant computation to train. Activating only a portion of the network at a time can cut the requirement dramatically and still achieve exceptional results.
3 min read
Different graphs showing switch transformer data
Mixture of Experts (MoE)

Bigger, Faster Transformers

Performance in language tasks rises with the size of the model — yet, as a model’s parameter count rises, so does the time it takes to render output. New work pumps up the number of parameters without slowing down the network.
2 min read

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox