d/dx Times Labs

1 Post

Single Headed Attention RNN (SHA-RNN)
d/dx Times Labs

Language Modeling on One GPU

The latest large, pretrained language models rely on trendy layers based on transformer networks. New research shows that these newfangled layers may not be necessary.
2 min read

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox