Single Headed Attention RNN (SHA-RNN)

1 Post

Single Headed Attention RNN (SHA-RNN)
Single Headed Attention RNN (SHA-RNN)

Language Modeling on One GPU

The latest large, pretrained language models rely on trendy layers based on transformer networks. New research shows that these newfangled layers may not be necessary.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox