Single Headed Attention RNN (SHA-RNN)
Language Modeling on One GPU
The latest large, pretrained language models rely on trendy layers based on transformer networks. New research shows that these newfangled layers may not be necessary.
1 Post
Stay updated with weekly AI News and Insights delivered to your inbox