d/dx Times Labs

1 Post

Single Headed Attention RNN (SHA-RNN)
d/dx Times Labs

Language Modeling on One GPU: Single-headed attention competes with transformers.

The latest large, pretrained language models rely on trendy layers based on transformer networks. New research shows that these newfangled layers may not be necessary.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox