Big Bird

3 Posts

Graph showing information about different transformer models
Big Bird

Transformer Variants Head to Head

The transformer architecture has inspired a plethora of variations. Yet researchers have used a patchwork of metrics to evaluate their performance, making them hard to compare. New work aims to level the playing field.
Graphs related to different attention mechanisms
Big Bird

More Efficient Transformers

As transformer networks move to the fore in applications from language to vision, the time it takes them to crunch longer sequences becomes a more pressing issue. A new method lightens the computational load using sparse attention.
Sesame Street characters together
Big Bird

Inside AI’s Muppet Empire: Why Are So Many NLP Models Named After Muppets?

As language models show increasing power, a parallel trend has received less notice: The vogue for naming models after characters in the children’s TV show Sesame Street.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox