GLaM

1 Post

Efficiency Experts: Mixture of Experts Makes Language Models More Efficient

The emerging generation of trillion-parameter language models take significant computation to train. Activating only a portion of the network at a time can cut the requirement dramatically and still achieve exceptional results.

GLaM

Efficiency Experts: Mixture of Experts Makes Language Models More Efficient

Subscribe to The Batch