RoBerta

5 Posts

Dataset FOLIO example based on the Wild Turkey Wikipedia page
RoBerta

Language Models Defy Logic: Large NLP models struggle with logical reasoning.

Who would disagree that, if all people are mortal and Socrates is a person, Socrates must be mortal? GPT-3, for one. Recent work shows that bigger language models are not necessarily better when it comes to logical reasoning.
Graphs related to different attention mechanisms
RoBerta

More Efficient Transformers: BigBird is an efficient attention mechanism for transformers.

As transformer networks move to the fore in applications from language to vision, the time it takes them to crunch longer sequences becomes a more pressing issue. A new method lightens the computational load using sparse attention.
Yann LeCun
RoBerta

Yann LeCun — Learning From Observation: The power of self-supervised learning

How is it that many people learn to drive a car fairly safely in 20 hours of practice, while current imitation learning algorithms take hundreds of thousands of hours, and reinforcement learning algorithms take millions of hours? Clearly we’re missing something big.
Sesame Street characters together
RoBerta

Inside AI’s Muppet Empire: Why Are So Many NLP Models Named After Muppets?

As language models show increasing power, a parallel trend has received less notice: The vogue for naming models after characters in the children’s TV show Sesame Street.
Bert and Ernie from Sesame Street
RoBerta

BERT Is Back

Less than a month after XLNet overtook BERT, the pole position in natural language understanding changed hands again. RoBERTa is an improved BERT pretraining recipe that beats its forbear, becoming the new state-of-the-art language model — for the moment.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox