Pay Attention When Required (PAR)

1 Post

Selective Attention: More efficient NLP training without sacrificing performance

Large transformer networks work wonders with natural language, but they require enormous amounts of computation. New research slashes processor cycles without compromising performance.

Pay Attention When Required (PAR)

Selective Attention: More efficient NLP training without sacrificing performance

Subscribe to The Batch