Transformer-XL

2 Posts

Data related to Nvidia's Pay Attention When Required (Par) approach
Transformer-XL

Selective Attention: More efficient NLP training without sacrificing performance

Large transformer networks work wonders with natural language, but they require enormous amounts of computation. New research slashes processor cycles without compromising performance.
Graph related to Language Model Analysis (LAMA)
Transformer-XL

What Language Models Know

Watson set a high bar for language understanding in 2011, when it famously whipped human competitors in the televised trivia game show Jeopardy! IBM’s special-purpose AI required around $1 billion. Research suggests that today’s best language models can accomplish similar tasks right off the shelf.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox