1 Post

Graph related to Mixture of Softmaxes (MoS)

Upgrading Softmax: Mixtape is a faster way to avoid the softmax bottleneck.

Softmax commonly computes probabilities in a classifier’s output layer. But softmax isn’t always accurate in complex tasks — say, in a natural-language task, when the length of word vectors is much smaller than the number of words in the vocabulary.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox