RoBerta

5 Posts

Dataset FOLIO example based on the Wild Turkey Wikipedia page

Language Models Defy Logic: Large NLP models struggle with logical reasoning.

Who would disagree that, if all people are mortal and Socrates is a person, Socrates must be mortal? GPT-3, for one. Recent work shows that bigger language models are not necessarily better when it comes to logical reasoning.

Graphs related to different attention mechanisms

RoBerta

More Efficient Transformers: BigBird is an efficient attention mechanism for transformers.

As transformer networks move to the fore in applications from language to vision, the time it takes them to crunch longer sequences becomes a more pressing issue. A new method lightens the computational load using sparse attention.

RoBerta

Yann LeCun — Learning From Observation: The power of self-supervised learning

How is it that many people learn to drive a car fairly safely in 20 hours of practice, while current imitation learning algorithms take hundreds of thousands of hours, and reinforcement learning algorithms take millions of hours? Clearly we’re missing something big.

RoBerta

Inside AI’s Muppet Empire: Why Are So Many NLP Models Named After Muppets?

As language models show increasing power, a parallel trend has received less notice: The vogue for naming models after characters in the children’s TV show Sesame Street.

RoBerta

BERT Is Back

Less than a month after XLNet overtook BERT, the pole position in natural language understanding changed hands again. RoBERTa is an improved BERT pretraining recipe that beats its forbear, becoming the new state-of-the-art language model — for the moment.

RoBerta

Language Models Defy Logic: Large NLP models struggle with logical reasoning.

More Efficient Transformers: BigBird is an efficient attention mechanism for transformers.

Yann LeCun — Learning From Observation: The power of self-supervised learning

Inside AI’s Muppet Empire: Why Are So Many NLP Models Named After Muppets?

BERT Is Back

Subscribe to The Batch