Text-to-Text Transfer Transformer (T5)

Different graphs showing switch transformer data
Bigger, Faster Transformers

Performance in language tasks rises with the size of the model — yet, as a model’s parameter count rises, so does the time it takes to render output. New work pumps up the number of parameters without slowing down the network.
