Single Headed Attention RNN (SHA-RNN)

1 Post

Single Headed Attention RNN (SHA-RNN)
Single Headed Attention RNN (SHA-RNN)

Language Modeling on One GPU

The latest large, pretrained language models rely on trendy layers based on transformer networks. New research shows that these newfangled layers may not be necessary.
2 min read

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox