RoBerta

4 Posts

Graphs related to different attention mechanisms
RoBerta

More Efficient Transformers

As transformer networks move to the fore in applications from language to vision, the time it takes them to crunch longer sequences becomes a more pressing issue. A new method lightens the computational load using sparse attention.
2 min read
Yann LeCun
RoBerta

Yann LeCun: Learning From Observation

How is it that many people learn to drive a car fairly safely in 20 hours of practice, while current imitation learning algorithms take hundreds of thousands of hours, and reinforcement learning algorithms take millions of hours? Clearly we’re missing something big.
2 min read
Sesame Street characters together
RoBerta

Inside AI’s Muppet Empire

As language models show increasing power, a parallel trend has received less notice: The vogue for naming models after characters in the children’s TV show Sesame Street.
1 min read
Bert and Ernie from Sesame Street
RoBerta

BERT Is Back

Less than a month after XLNet overtook BERT, the pole position in natural language understanding changed hands again. RoBERTa is an improved BERT pretraining recipe that beats its forbear, becoming the new state-of-the-art language model — for the moment.
2 min read

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox